Modalities
Live today — Visual Document Intelligence + Audio, ingesting now:- Charts
- Graphs & plots
- Infographics
- Complex, multi-page tables
- Slides & decks
- Reports & filings
- Scanned & photographed pages
- Invoices & forms
- Handwriting & annotations
- Diagrams & flowcharts
- Photos & images
- Audio (calls, meetings, recordings)
- Healthcare scans / EHR
- Chemical & molecular data
- CAD & technical drawings
- Video
- Heatmaps
Overview
Documents
PDF, DOCX, PPTX
Text
TXT, Markdown
Images
PNG, JPG, screenshots, scans
Audio
WAV, MP3, M4A, and more
Documents
PDF
Reports, filings, contracts, research papers, scanned books, invoices. Polyvia reads everything on the page — not just the text layer:
- Text — body text, headings, footnotes, and multi-column layouts
- Tables — including complex, multi-page, and nested tables
- Charts & graphs — bar, line, pie, scatter, and combo charts read back as data
- Infographics & diagrams — flowcharts, org charts, schematics, maps
- Figures & images — embedded photos, logos, and screenshots
- Scanned & photographed pages — OCR over image-only PDFs
- Handwriting — handwritten notes and annotations
- Forms & invoices — fields, line items, stamps, and signatures
.pdfWord Documents
Word Documents
Memos, proposals, internal docs. Headings, lists, and tables are preserved so structure-aware queries (e.g. “what does section 3.2 say about…”) work correctly.Extensions:
.docx, .docSlides & Decks
Slides & Decks
Pitch decks, board decks, training material. Slide order is preserved and per-slide visuals (charts, images, diagrams) are extracted alongside speaker notes.Extensions:
.pptx, .pptText
Plain Text
Plain Text
Logs, transcripts, notes, raw exports. Indexed line-by-line with no formatting loss.Extensions:
.txtMarkdown
Markdown
READMEs, wikis, internal docs, AI-generated reports. Headings, code blocks, and lists are parsed natively, so structural citations point to the right section.Extensions:
.mdImages
Photos & Renders
Photos & Renders
Product photography, architecture diagrams, whiteboards, screenshots, scanned receipts. Polyvia runs visual understanding to extract text, structure, and entities — citations point to the image with bounding boxes when applicable.Extensions:
.png, .jpg, .jpeg, .gif, .webp, .bmpAudio
Recordings & Calls
Recordings & Calls
Sales calls, interviews, podcasts, meeting recordings. Polyvia transcribes with timestamps and speaker turns — citations link directly to the utterance, and queries can return clickable timestamps that seek the player to the cited moment.Extensions:
.wav, .mp3, .m4aHow ingestion works
Regardless of format, every file flows through the same pipeline:- Parse — format-specific parser extracts raw content (text, page layout, frames, transcript).
- Extract — VLM and LLM passes pull out facts, tables, charts, and entities.
- Index — facts are linked into the ontology graph with source provenance (document, page, bounding box, or timestamp).
- Query — agents can search across all formats in the same call; citations point back to the exact source location.
Uploading
You can ingest any supported format through any of our interfaces:Polyvia Platform
Drag-and-drop upload — no code required.
REST API
Single or batch upload via
/api/v1/ingest.SDKs
Python and TypeScript SDKs with batch helpers.
Need a format we don’t list? Video, HTML, and email ingestion are on the roadmap. Email senyao@polyvia.ai and tell us what you’d like to throw at Polyvia.
