Index

The Index API handles vector embedding and indexing of parsed documents, making them searchable through semantic queries.

Create Index

Trigger indexing for a parsed document to make it searchable.

Endpoint

POST /api/index

Request Body

document_id

string

required

The ID of the document to index (must be already parsed)

embedding_model

string

default:"text-embedding-3-large"

The embedding model to use for vectorization

chunk_size

integer

default:"512"

Size of text chunks for embedding (in tokens)

chunk_overlap

integer

default:"50"

Overlap between consecutive chunks (in tokens)

Response

index_id

string

Unique identifier for the indexing job

document_id

string

Document being indexed

status

string

Indexing status: pending, processing, completed, failed

chunks_created

integer

Number of chunks created from the document

Example

curl -X POST https://api.polyvia.ai/api/index \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_abc123xyz",
    "embedding_model": "text-embedding-3-large",
    "chunk_size": 512
  }'

{
  "index_id": "idx_xyz789",
  "document_id": "doc_abc123xyz",
  "status": "processing",
  "chunks_created": 0
}

Get Index Status

Check the status of an indexing job and retrieve statistics.

Endpoint

GET /api/index/{index_id}

Path Parameters

index_id

string

required

The indexing job identifier

Response

index_id

string

Indexing job identifier

document_id

string

Document being indexed

status

string

Current status: pending, processing, completed, failed

chunks_created

integer

Total number of chunks created

vectors_embedded

integer

Number of vectors successfully embedded

progress

number

Completion percentage (0-100)

error

string | null

Error message if indexing failed

Example

curl https://api.polyvia.ai/api/index/idx_xyz789 \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "index_id": "idx_xyz789",
  "document_id": "doc_abc123xyz",
  "status": "completed",
  "chunks_created": 45,
  "vectors_embedded": 45,
  "progress": 100,
  "error": null
}

Re-index Document

Re-index an existing document with new settings or after content updates.

Endpoint

PUT /api/index/{document_id}

Request Body

force

boolean

default:"false"

Force re-indexing even if document hasn’t changed

embedding_model

string

Update the embedding model

Example

curl -X PUT https://api.polyvia.ai/api/index/doc_abc123xyz \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"force": true}'

Embedding Models

OpenAI

text-embedding-3-large (3072 dimensions)
text-embedding-3-small (1536 dimensions)

Custom

Bring your own embedding model
Contact support for integration

Chunking Strategies

Fixed-size chunking

Default strategy that splits text into fixed-size chunks with overlap. Best for general-purpose indexing.Recommended settings:

Chunk size: 512 tokens
Overlap: 50 tokens

Semantic chunking

Splits text based on semantic boundaries (paragraphs, sections). Better for preserving context.

Available in Pro and Enterprise plans.

Smaller chunks provide more precise retrieval but may lose context. Larger chunks preserve context but may reduce precision. Experiment to find the optimal balance for your use case.

Introduction

Endpoints

Create Index

Endpoint

Request Body

Response

Example

Get Index Status

Endpoint

Path Parameters

Response

Example

Re-index Document

Endpoint

Request Body

Example

Embedding Models

OpenAI

Custom

Chunking Strategies

Introduction

Endpoints

​Create Index

​Endpoint

​Request Body

​Response

​Example

​Get Index Status

​Endpoint

​Path Parameters

​Response

​Example

​Re-index Document

​Endpoint

​Request Body

​Example

​Embedding Models

OpenAI

Custom

​Chunking Strategies

Create Index

Endpoint

Request Body

Response

Example

Get Index Status

Endpoint

Path Parameters

Response

Example

Re-index Document

Endpoint

Request Body

Example

Embedding Models

Chunking Strategies