Document Reranking

Improve search quality by reranking documents based on their relevance to a query. Use this as a second-stage ranker after initial retrieval.

Endpoint

POST https://api.assisters.dev/v1/rerank

Request Body

model

string

required

The reranking model to use. See available models.Options: bge-reranker-v2, jina-reranker

query

string

required

The search query to rank documents against.

documents

array

required

Array of documents to rerank. Each can be a string or an object with a text field.Maximum: 100 documents per request.

top_n

integer

Return only the top N results. Defaults to returning all documents.

return_documents

boolean

default:"true"

Whether to include the document text in the response.

Request Examples

Basic Reranking

import requests

response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={
        "Authorization": "Bearer ask_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "model": "bge-reranker-v2",
        "query": "What is machine learning?",
        "documents": [
            "Machine learning is a subset of artificial intelligence.",
            "The weather today is sunny and warm.",
            "Deep learning uses neural networks.",
            "Cats are popular pets worldwide."
        ]
    }
)

results = response.json()["results"]
for r in results:
    print(f"Score: {r['relevance_score']:.4f} - {r['document']['text'][:50]}...")

With Top N

response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={"Authorization": "Bearer ask_your_api_key"},
    json={
        "model": "bge-reranker-v2",
        "query": "Python programming",
        "documents": documents,
        "top_n": 3  # Only return top 3 results
    }
)

Two-Stage Retrieval

# Stage 1: Fast retrieval with embeddings
query_embedding = embed(query)
candidates = vector_db.search(query_embedding, limit=100)

# Stage 2: Precise reranking
reranked = rerank(
    model="bge-reranker-v2",
    query=query,
    documents=[c.text for c in candidates],
    top_n=10
)

# Return the top reranked results
final_results = reranked["results"]

Response

{
  "id": "rerank-abc123xyz",
  "model": "bge-reranker-v2",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9823,
      "document": {
        "text": "Machine learning is a subset of artificial intelligence."
      }
    },
    {
      "index": 2,
      "relevance_score": 0.8156,
      "document": {
        "text": "Deep learning uses neural networks."
      }
    },
    {
      "index": 1,
      "relevance_score": 0.0234,
      "document": {
        "text": "The weather today is sunny and warm."
      }
    },
    {
      "index": 3,
      "relevance_score": 0.0089,
      "document": {
        "text": "Cats are popular pets worldwide."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "total_tokens": 45
  }
}

Response Fields

string

Unique identifier for the rerank request

model

string

The model used for reranking

results

array

Array of reranked documents, sorted by relevance (highest first):

index: Original position in the input array
relevance_score: Relevance score between 0 and 1
document: The document object (if return_documents is true)

usage

object

Token usage for billing

Available Models

Model	Description	Max Tokens	Price
`bge-reranker-v2`	BAAI’s high-quality reranker	512	$0.05/M tokens
`jina-reranker`	Jina AI’s efficient reranker	8192	$0.08/M tokens

Compare Reranking Models

See detailed model specifications and benchmarks

Use Cases

Search Quality Improvement

Use reranking after initial vector search to improve precision:

# Vector search returns ~100 candidates
candidates = vector_search(query, limit=100)

# Rerank to find the best 10
reranked = rerank(query, candidates, top_n=10)

# Much better precision than vector search alone
return reranked

RAG Pipeline Enhancement

Improve RAG by reranking retrieved context:

# Retrieve potentially relevant chunks
chunks = retrieve_chunks(question, limit=20)

# Rerank to find the most relevant
reranked = rerank(question, chunks, top_n=5)

# Use top chunks as context
context = "\n".join([r.text for r in reranked])

response = chat(
    messages=[
        {"role": "system", "content": f"Context: {context}"},
        {"role": "user", "content": question}
    ]
)

Cross-Encoder Scoring

Get precise relevance scores for document pairs:

# Score relevance of specific documents
results = rerank(
    query="API documentation",
    documents=[
        "REST API guide for developers",
        "Company holiday schedule",
        "API authentication methods"
    ]
)

# Use scores for filtering or display
relevant = [r for r in results if r.relevance_score > 0.5]

Hybrid Search

Combine keyword and semantic search with reranking:

# Get candidates from multiple sources
keyword_results = keyword_search(query, limit=50)
vector_results = vector_search(query, limit=50)

# Deduplicate
all_candidates = deduplicate(keyword_results + vector_results)

# Rerank combined results
final = rerank(query, all_candidates, top_n=10)

Best Practices

Two-Stage Retrieval

Use fast retrieval first, then rerank the top candidates

Limit Candidate Size

Rerank 50-100 candidates for best speed/quality tradeoff

Use for RAG

Rerank retrieved chunks before feeding to LLM

Score Thresholds

Filter results below a relevance threshold for quality

Performance Tips

Candidates	Latency	Recommendation
10-20	~100ms	Good for real-time
50-100	~300ms	Best quality/speed
100+	500ms+	Consider batching

Error Responses

400 Bad Request - Too Many Documents

{
  "error": {
    "message": "Too many documents. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_documents"
  }
}

400 Bad Request - Empty Query

{
  "error": {
    "message": "Query cannot be empty",
    "type": "invalid_request_error",
    "code": "empty_query"
  }
}

Overview

Endpoints

Reranking

Document Reranking

Endpoint

Request Body

Request Examples

Basic Reranking

With Top N

Two-Stage Retrieval

Response

Response Fields

Available Models

Compare Reranking Models

Use Cases

Best Practices

Two-Stage Retrieval

Limit Candidate Size

Use for RAG

Score Thresholds

Performance Tips

Error Responses

Overview

Endpoints

​Document Reranking

​Endpoint

​Request Body

​Request Examples

​Basic Reranking

​With Top N

​Two-Stage Retrieval

​Response

​Response Fields

​Available Models

Compare Reranking Models

​Use Cases

​Best Practices

Two-Stage Retrieval

Limit Candidate Size

Use for RAG

Score Thresholds

​Performance Tips

​Error Responses

Document Reranking

Endpoint

Request Body

Request Examples

Basic Reranking

With Top N

Two-Stage Retrieval

Response

Response Fields

Available Models

Use Cases

Best Practices

Performance Tips

Error Responses