Skip to main content
The Index API handles vector embedding and indexing of parsed documents, making them searchable through semantic queries.

Create Index

Trigger indexing for a parsed document to make it searchable.

Endpoint

POST /api/index

Request Body

document_id
string
required
The ID of the document to index (must be already parsed)
embedding_model
string
default:"text-embedding-3-large"
The embedding model to use for vectorization
chunk_size
integer
default:"512"
Size of text chunks for embedding (in tokens)
chunk_overlap
integer
default:"50"
Overlap between consecutive chunks (in tokens)

Response

index_id
string
Unique identifier for the indexing job
document_id
string
Document being indexed
status
string
Indexing status: pending, processing, completed, failed
chunks_created
integer
Number of chunks created from the document

Example

curl -X POST https://api.polyvia.ai/api/index \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_abc123xyz",
    "embedding_model": "text-embedding-3-large",
    "chunk_size": 512
  }'
{
  "index_id": "idx_xyz789",
  "document_id": "doc_abc123xyz",
  "status": "processing",
  "chunks_created": 0
}

Get Index Status

Check the status of an indexing job and retrieve statistics.

Endpoint

GET /api/index/{index_id}

Path Parameters

index_id
string
required
The indexing job identifier

Response

index_id
string
Indexing job identifier
document_id
string
Document being indexed
status
string
Current status: pending, processing, completed, failed
chunks_created
integer
Total number of chunks created
vectors_embedded
integer
Number of vectors successfully embedded
progress
number
Completion percentage (0-100)
error
string | null
Error message if indexing failed

Example

curl https://api.polyvia.ai/api/index/idx_xyz789 \
  -H "Authorization: Bearer YOUR_API_KEY"
{
  "index_id": "idx_xyz789",
  "document_id": "doc_abc123xyz",
  "status": "completed",
  "chunks_created": 45,
  "vectors_embedded": 45,
  "progress": 100,
  "error": null
}

Re-index Document

Re-index an existing document with new settings or after content updates.

Endpoint

PUT /api/index/{document_id}

Request Body

force
boolean
default:"false"
Force re-indexing even if document hasn’t changed
embedding_model
string
Update the embedding model

Example

curl -X PUT https://api.polyvia.ai/api/index/doc_abc123xyz \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"force": true}'

Embedding Models

OpenAI

  • text-embedding-3-large (3072 dimensions)
  • text-embedding-3-small (1536 dimensions)

Custom

  • Bring your own embedding model
  • Contact support for integration

Chunking Strategies

Default strategy that splits text into fixed-size chunks with overlap. Best for general-purpose indexing.Recommended settings:
  • Chunk size: 512 tokens
  • Overlap: 50 tokens
Splits text based on semantic boundaries (paragraphs, sections). Better for preserving context.
Available in Pro and Enterprise plans.
Smaller chunks provide more precise retrieval but may lose context. Larger chunks preserve context but may reduce precision. Experiment to find the optimal balance for your use case.