Document Management

Overview

Starnion’s document management feature lets you upload documents in a variety of formats — PDF, Word, Excel, PowerPoint, plain text, and more — and use them together with AI. Once a document is uploaded, the AI automatically reads and analyzes its contents, so you can ask natural-language questions in chat such as “Find the delivery deadline in the contract.”

Uploaded documents go through a pipeline of text extraction → chunking → vector embedding before being stored in a searchable state. The AI then retrieves relevant sections to provide accurate answers.


Supported Formats

Format Extension Text Extraction AI Search AI Chat
PDF .pdf Yes Yes Yes
Word .doc, .docx Yes Yes Yes
Excel .xls, .xlsx Yes Yes Yes
PowerPoint .ppt, .pptx Yes Yes Yes
Plain Text .txt Yes Yes Yes
Markdown .md Yes Yes Yes
CSV .csv Yes Yes Yes

HWP (Hangul Word Processor) files are not currently supported directly. Export HWP files to PDF before uploading.


Uploading Documents

Attaching a File in Chat

Click the file attachment button at the bottom of the chat input box and select your document, or drag and drop a file directly into the chat window.

User: [Attaches contract.pdf]
AI:   contract.pdf has been uploaded. Analyzing the contents... Done!
      You can now ask questions about this document.

User: What is the delivery deadline in this contract?
AI:   According to Article 4 of the contract, the delivery deadline is June 30, 2025.
      Specifically, it states: "Party B shall deliver the products by June 30, 2025."

Uploading from the Document Menu

Go to the top menu > Documents tab, then click the Upload Document button or drop a file into the drag-and-drop area.


Document Processing Pipeline

When a document is uploaded, the following steps run automatically in the background.

1. Upload       → File saved to MinIO storage
2. Parsing      → Text extracted according to file format
3. Chunking     → Long document split into appropriately sized pieces
4. Embedding    → Each chunk converted into a vector the AI can understand
5. Storage      → Saved to the database in a searchable state
6. Done         → Ready for immediate use in AI chat

Processing time varies by file size. Documents under 1 MB are typically completed within 10 seconds.


Searching and Q&A with Documents

You can ask the AI natural-language questions about the content of any uploaded document.

Contract-related:

User: Find the penalty clause in the contract.
AI:   Per Article 8 (Penalty): If Party B is unable to meet the delivery deadline,
      Party B must pay a penalty of 0.1% of the contract amount for each day of delay.

Meeting minutes summary:

User: Summarize only the decisions made in the last meeting minutes.
AI:   Decisions from the March 5 meeting:
      1. New feature launch date: April 15
      2. Marketing budget: KRW 5,000,000 approved
      3. Owners: Team Lead Kim (Development), Director Lee (Marketing)

Excel data analysis:

User: Tell me the top 5 best-selling products in the January sales data.
AI:   Top 5 products by January sales:
      1st: Product A — 1,250 units (KRW 3,750,000)
      2nd: Product B — 980 units (KRW 2,940,000)
      ...

Using Documents in AI Chat

Attaching and Asking Questions

Attaching a document directly in chat lets you have a conversation based on that document.

User: [Attaches employee-manual.pdf] Summarize the annual leave policy.
AI:   Summary of Chapter 3 (Leave Policy) from the manual:
      - Annual leave accrual: 15 days after 1 year of service, +1 day every 2 years thereafter (max 25 days)
      - Usage period: Must be used within 1 year of accrual date
      - Application method: Team leader approval required 3 days in advance
      - Half-day leave: Can be split into morning (09:00–13:00) or afternoon (14:00–18:00)

Comparing Multiple Documents

User: Compare the two quotes and tell me which one is more favorable.
AI:   Comparison of quotes from Vendor A and Vendor B:
      Vendor A: KRW 4,500,000 total, 30-day delivery, 1-year warranty
      Vendor B: KRW 4,800,000 total, 25-day delivery, 2-year warranty
      → Vendor B is KRW 300,000 more expensive, but delivers 5 days sooner and offers double the warranty period.

AI Document Generation

You can also ask the AI to draft new documents or reports. Generated documents are automatically saved to your document list.

User: Draft a March work report. This month I completed 3 new feature developments and 12 bug fixes.
AI:   Your March work report has been created.
      [report.docx generated]
      You can download it from the Documents tab.

Managing Documents

Viewing the Document List

In the Documents tab, you can view all uploaded documents in chronological order. Up to 100 documents are displayed.

Deleting Documents

Click the delete button next to a document in the list, or request deletion in chat.

User: Delete "2024-annual-report.pdf"
AI:   2024-annual-report.pdf has been deleted.

Limitations

Item Limit
Maximum file size 50 MB
List display Most recent 100
Supported formats PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, TXT, MD, CSV
Unsupported formats HWP, image files (use the Images tab for images)

Tips & FAQ

Q. I uploaded a document but the AI says it doesn’t know the content. A. File processing may take a moment. Please wait and try your question again. Large files can take 1–2 minutes to process.

Q. Can scanned PDFs be read? A. Scanned PDFs consist of images, making text extraction difficult. Please use OCR-processed PDFs or PDFs that contain selectable text.

Q. Can I ask questions about only a specific sheet in an Excel file? A. Yes — simply mention the sheet name in your question. For example: “Tell me the March total from the Sales sheet.”

Q. Are my uploaded documents stored securely? A. All documents are stored in encrypted MinIO storage and are accessible only by you.


Document Parsing Engine

Starting with Starnion v1.2.0, Docling (IBM open-source, MIT license) is used as the document parsing engine. Compared to the previous simple text extraction approach, the following improvements have been made:

Improvement Before After
PDF tables Text extracted only Table structure preserved
Heading structure Ignored Section-level chunking
PPTX layout Text listed sequentially Slide structure recognized
Search quality Simple character splitting Semantic chunking

Image files (PNG, JPG, GIF, BMP, TIFF) can also have text extracted via OCR.


Copyright © 2025 StarNion. All rights reserved.  |  v0.1.1