Retrieval

The Retrieval API provides semantic search capabilities across your indexed documents, enabling precise information retrieval with citations and source tracking.

Semantic Search

Perform semantic search across your entire knowledge base using natural language queries.

Endpoint

POST /api/retrieval/search

Request Body

query

string

required

Natural language search query

limit

integer

default:"10"

Maximum number of results to return (1-100)

filters

object

Filter results by document metadata

Show Filter options

document_ids

array

Limit search to specific documents

Response

results

array

Array of search results ordered by relevance

Show Result object

chunk_id

string

Unique identifier for the text chunk

content

string

The relevant text content

score

number

Relevance score (0-1)

document_id

string

Source document identifier

document_name

string

Source document name

page_number

integer

Page number in source document (if applicable)

metadata

object

Additional metadata from the source document

total_results

integer

Total number of matching results

query_time_ms

integer

Query execution time in milliseconds

Example

curl -X POST https://api.polyvia.ai/api/retrieval/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the security best practices?",
    "limit": 5,
    "search_mode": "hybrid",
    "include_citations": true
  }'

{
  "results": [
    {
      "chunk_id": "chunk_abc123",
      "content": "Security best practices include implementing multi-factor authentication, encrypting data at rest and in transit, and conducting regular security audits.",
      "score": 0.94,
      "document_id": "doc_xyz789",
      "document_name": "Security Guidelines 2024",
      "page_number": 12,
      "metadata": {
        "tags": ["security", "compliance"],
        "author": "Security Team"
      }
    }
  ],
  "total_results": 15,
  "query_time_ms": 127
}

Hybrid Search

Combine semantic and keyword search for optimal results.

How It Works

Hybrid search leverages both:

Semantic search: Understands meaning and context using vector embeddings
Keyword search: Matches exact terms and phrases using BM25 algorithm

Results are ranked using a weighted combination of both scores.

Configuration

semantic_weight

number

default:"0.7"

Weight for semantic search (0-1)

keyword_weight

number

default:"0.3"

Weight for keyword search (0-1)

Weights must sum to 1.0

Example

{
  "query": "machine learning algorithms",
  "search_mode": "hybrid",
  "semantic_weight": 0.6,
  "keyword_weight": 0.4
}

Retrieve by Document

Retrieve specific chunks from a document with optional filtering.

Endpoint

GET /api/retrieval/document/{document_id}

Path Parameters

document_id

string

required

Document identifier

Query Parameters

page

integer

Retrieve chunks from a specific page

limit

integer

default:"50"

Maximum chunks to return

Example

curl "https://api.polyvia.ai/api/retrieval/document/doc_xyz789?page=5&limit=10" \
  -H "Authorization: Bearer YOUR_API_KEY"

Advanced Filtering

Filter by Date Range

{
  "query": "quarterly reports",
  "filters": {
    "date_range": {
      "start": "2024-01-01",
      "end": "2024-03-31"
    }
  }
}

Filter by Tags

{
  "query": "compliance requirements",
  "filters": {
    "tags": ["legal", "compliance"]
  }
}

Filter by Multiple Criteria

{
  "query": "financial analysis",
  "filters": {
    "tags": ["finance"],
    "document_ids": ["doc_123", "doc_456"],
    "date_range": {
      "start": "2024-01-01"
    }
  }
}

Search Modes Comparison

Semantic

Best for:

Conceptual queries
Paraphrased questions
Cross-lingual search

Example: “How to improve customer satisfaction”

Keyword

Best for:

Exact term matching
Technical identifiers
Product codes

Example: “SKU-12345”

Hybrid

Best for:

General queries
Mixed intent
Balanced precision/recall

Example: “Q4 2024 revenue report”

Ranking and Relevance

Results are ranked using multiple signals:

Semantic similarity: Cosine similarity between query and document embeddings
Keyword match: BM25 score for term frequency and document length
Recency: Newer documents receive a slight boost
Source authority: Documents with more citations rank higher

Use semantic mode for conceptual questions and keyword mode for exact matches. hybrid mode provides the best balance for most use cases.

Rate Limits

Free tier: 100 queries per day
Pro tier: 1,000 queries per day
Enterprise: Unlimited queries

Large result sets (limit > 50) count as multiple queries against your quota.

Introduction

Endpoints

Semantic Search

Endpoint

Request Body

Response

Example

Hybrid Search

How It Works

Configuration

Example

Retrieve by Document

Endpoint

Path Parameters

Query Parameters

Example

Advanced Filtering

Filter by Date Range

Filter by Tags

Filter by Multiple Criteria

Search Modes Comparison

Semantic

Keyword

Hybrid

Ranking and Relevance

Rate Limits

Introduction

Endpoints

​Semantic Search

​Endpoint

​Request Body

​Response

​Example

​Hybrid Search

​How It Works

​Configuration

​Example

​Retrieve by Document

​Endpoint

​Path Parameters

​Query Parameters

​Example

​Advanced Filtering

​Filter by Date Range

​Filter by Tags

​Filter by Multiple Criteria

​Search Modes Comparison

Semantic

Keyword

Hybrid

​Ranking and Relevance

​Rate Limits

Semantic Search

Endpoint

Request Body

Response

Example

Hybrid Search

How It Works

Configuration

Example

Retrieve by Document

Endpoint

Path Parameters

Query Parameters

Example

Advanced Filtering

Filter by Date Range

Filter by Tags

Filter by Multiple Criteria

Search Modes Comparison

Ranking and Relevance

Rate Limits