Skip to main content
Polyvia ingests text, visual, and audio source material into the same unified knowledge graph. You don’t need to pre-classify or pre-convert your files — drop in whatever your team produces and Polyvia routes each file through the right parser automatically.

Modalities

Live today — Visual Document Intelligence + Audio, ingesting now:
  • Charts
  • Graphs & plots
  • Infographics
  • Complex, multi-page tables
  • Slides & decks
  • Reports & filings
  • Scanned & photographed pages
  • Invoices & forms
  • Handwriting & annotations
  • Diagrams & flowcharts
  • Photos & images
  • Audio (calls, meetings, recordings)
Coming next:
  • Healthcare scans / EHR
  • Chemical & molecular data
  • CAD & technical drawings
  • Video
  • Heatmaps

Overview

Documents

PDF, DOCX, PPTX

Text

TXT, Markdown

Images

PNG, JPG, screenshots, scans

Audio

WAV, MP3, M4A, and more

Documents

Reports, filings, contracts, research papers, scanned books, invoices. Polyvia reads everything on the page — not just the text layer:
  • Text — body text, headings, footnotes, and multi-column layouts
  • Tables — including complex, multi-page, and nested tables
  • Charts & graphs — bar, line, pie, scatter, and combo charts read back as data
  • Infographics & diagrams — flowcharts, org charts, schematics, maps
  • Figures & images — embedded photos, logos, and screenshots
  • Scanned & photographed pages — OCR over image-only PDFs
  • Handwriting — handwritten notes and annotations
  • Forms & invoices — fields, line items, stamps, and signatures
Page numbers and layout are preserved for citations.Extensions: .pdf
Memos, proposals, internal docs. Headings, lists, and tables are preserved so structure-aware queries (e.g. “what does section 3.2 say about…”) work correctly.Extensions: .docx, .doc
Pitch decks, board decks, training material. Slide order is preserved and per-slide visuals (charts, images, diagrams) are extracted alongside speaker notes.Extensions: .pptx, .ppt

Text

Logs, transcripts, notes, raw exports. Indexed line-by-line with no formatting loss.Extensions: .txt
READMEs, wikis, internal docs, AI-generated reports. Headings, code blocks, and lists are parsed natively, so structural citations point to the right section.Extensions: .md

Images

Product photography, architecture diagrams, whiteboards, screenshots, scanned receipts. Polyvia runs visual understanding to extract text, structure, and entities — citations point to the image with bounding boxes when applicable.Extensions: .png, .jpg, .jpeg, .gif, .webp, .bmp

Audio

Sales calls, interviews, podcasts, meeting recordings. Polyvia transcribes with timestamps and speaker turns — citations link directly to the utterance, and queries can return clickable timestamps that seek the player to the cited moment.Extensions: .wav, .mp3, .m4a

How ingestion works

Regardless of format, every file flows through the same pipeline:
  1. Parse — format-specific parser extracts raw content (text, page layout, frames, transcript).
  2. Extract — VLM and LLM passes pull out facts, tables, charts, and entities.
  3. Index — facts are linked into the ontology graph with source provenance (document, page, bounding box, or timestamp).
  4. Query — agents can search across all formats in the same call; citations point back to the exact source location.

Uploading

You can ingest any supported format through any of our interfaces:

Polyvia Platform

Drag-and-drop upload — no code required.

REST API

Single or batch upload via /api/v1/ingest.

SDKs

Python and TypeScript SDKs with batch helpers.
Need a format we don’t list? Video, HTML, and email ingestion are on the roadmap. Email senyao@polyvia.ai and tell us what you’d like to throw at Polyvia.