Supported Formats

Polyvia ingests text, visual, and audio source material into the same unified knowledge graph. You don’t need to pre-classify or pre-convert your files — drop in whatever your team produces and Polyvia routes each file through the right parser automatically.

Modalities

Live today — Visual Document Intelligence + Audio, ingesting now:

Charts
Graphs & plots
Infographics
Complex, multi-page tables
Slides & decks
Reports & filings
Scanned & photographed pages
Invoices & forms
Handwriting & annotations
Diagrams & flowcharts
Photos & images
Audio (calls, meetings, recordings)

Coming next:

Healthcare scans / EHR
Chemical & molecular data
CAD & technical drawings
Video
Heatmaps

Overview

Documents

PDF, DOCX, PPTX

Text

TXT, Markdown

Images

PNG, JPG, screenshots, scans

Audio

WAV, MP3, M4A, and more

Documents

PDF

Reports, filings, contracts, research papers, scanned books, invoices. Polyvia reads everything on the page — not just the text layer:

Text — body text, headings, footnotes, and multi-column layouts
Tables — including complex, multi-page, and nested tables
Charts & graphs — bar, line, pie, scatter, and combo charts read back as data
Infographics & diagrams — flowcharts, org charts, schematics, maps
Figures & images — embedded photos, logos, and screenshots
Scanned & photographed pages — OCR over image-only PDFs
Handwriting — handwritten notes and annotations
Forms & invoices — fields, line items, stamps, and signatures

Page numbers and layout are preserved for citations.Extensions: .pdf

Word Documents

Memos, proposals, internal docs. Headings, lists, and tables are preserved so structure-aware queries (e.g. “what does section 3.2 say about…”) work correctly.Extensions: .docx, .doc

Slides & Decks

Pitch decks, board decks, training material. Slide order is preserved and per-slide visuals (charts, images, diagrams) are extracted alongside speaker notes.Extensions: .pptx, .ppt

Text

Plain Text

Logs, transcripts, notes, raw exports. Indexed line-by-line with no formatting loss.Extensions: .txt

Markdown

READMEs, wikis, internal docs, AI-generated reports. Headings, code blocks, and lists are parsed natively, so structural citations point to the right section.Extensions: .md

Images

Photos & Renders

Product photography, architecture diagrams, whiteboards, screenshots, scanned receipts. Polyvia runs visual understanding to extract text, structure, and entities — citations point to the image with bounding boxes when applicable.Extensions: .png, .jpg, .jpeg, .gif, .webp, .bmp

Audio

Recordings & Calls

Sales calls, interviews, podcasts, meeting recordings. Polyvia transcribes with timestamps and speaker turns — citations link directly to the utterance, and queries can return clickable timestamps that seek the player to the cited moment.Extensions: .wav, .mp3, .m4a

How ingestion works

Regardless of format, every file flows through the same pipeline:

Parse — format-specific parser extracts raw content (text, page layout, frames, transcript).
Extract — VLM and LLM passes pull out facts, tables, charts, and entities.
Index — facts are linked into the ontology graph with source provenance (document, page, bounding box, or timestamp).
Query — agents can search across all formats in the same call; citations point back to the exact source location.

Uploading

You can ingest any supported format through any of our interfaces:

Polyvia Platform

Drag-and-drop upload — no code required.

REST API

Single or batch upload via /api/v1/ingest.

SDKs

Python and TypeScript SDKs with batch helpers.

Need a format we don’t list? Video, HTML, and email ingestion are on the roadmap. Email senyao@polyvia.ai and tell us what you’d like to throw at Polyvia.

​Modalities

​Overview

Documents

Text

Images

Audio

​Documents

​Text

​Images

​Audio

​How ingestion works

​Uploading

Polyvia Platform

REST API

SDKs

Modalities

Overview

Documents

Text

Images

Audio

How ingestion works

Uploading