Skip to main content
POST
https://api.assisters.dev
/
v1
/
rerank
Reranking
curl --request POST \
  --url https://api.assisters.dev/v1/rerank \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "query": "<string>",
  "documents": [
    {}
  ],
  "top_n": 123,
  "return_documents": true
}
'
{
  "id": "<string>",
  "model": "<string>",
  "results": [
    {}
  ],
  "usage": {}
}

Document Reranking

Improve search quality by reranking documents based on their relevance to a query. Use this as a second-stage ranker after initial retrieval.

Endpoint

POST https://api.assisters.dev/v1/rerank

Request Body

model
string
required
The reranking model to use. See available models.Options: bge-reranker-v2, jina-reranker
query
string
required
The search query to rank documents against.
documents
array
required
Array of documents to rerank. Each can be a string or an object with a text field.Maximum: 100 documents per request.
top_n
integer
Return only the top N results. Defaults to returning all documents.
return_documents
boolean
default:"true"
Whether to include the document text in the response.

Request Examples

Basic Reranking

import requests

response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={
        "Authorization": "Bearer ask_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "model": "bge-reranker-v2",
        "query": "What is machine learning?",
        "documents": [
            "Machine learning is a subset of artificial intelligence.",
            "The weather today is sunny and warm.",
            "Deep learning uses neural networks.",
            "Cats are popular pets worldwide."
        ]
    }
)

results = response.json()["results"]
for r in results:
    print(f"Score: {r['relevance_score']:.4f} - {r['document']['text'][:50]}...")

With Top N

response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={"Authorization": "Bearer ask_your_api_key"},
    json={
        "model": "bge-reranker-v2",
        "query": "Python programming",
        "documents": documents,
        "top_n": 3  # Only return top 3 results
    }
)

Two-Stage Retrieval

# Stage 1: Fast retrieval with embeddings
query_embedding = embed(query)
candidates = vector_db.search(query_embedding, limit=100)

# Stage 2: Precise reranking
reranked = rerank(
    model="bge-reranker-v2",
    query=query,
    documents=[c.text for c in candidates],
    top_n=10
)

# Return the top reranked results
final_results = reranked["results"]

Response

{
  "id": "rerank-abc123xyz",
  "model": "bge-reranker-v2",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9823,
      "document": {
        "text": "Machine learning is a subset of artificial intelligence."
      }
    },
    {
      "index": 2,
      "relevance_score": 0.8156,
      "document": {
        "text": "Deep learning uses neural networks."
      }
    },
    {
      "index": 1,
      "relevance_score": 0.0234,
      "document": {
        "text": "The weather today is sunny and warm."
      }
    },
    {
      "index": 3,
      "relevance_score": 0.0089,
      "document": {
        "text": "Cats are popular pets worldwide."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "total_tokens": 45
  }
}

Response Fields

id
string
Unique identifier for the rerank request
model
string
The model used for reranking
results
array
Array of reranked documents, sorted by relevance (highest first):
  • index: Original position in the input array
  • relevance_score: Relevance score between 0 and 1
  • document: The document object (if return_documents is true)
usage
object
Token usage for billing

Available Models

ModelDescriptionMax TokensPrice
bge-reranker-v2BAAI’s high-quality reranker512$0.05/M tokens
jina-rerankerJina AI’s efficient reranker8192$0.08/M tokens

Compare Reranking Models

See detailed model specifications and benchmarks

Use Cases

Use reranking after initial vector search to improve precision:
# Vector search returns ~100 candidates
candidates = vector_search(query, limit=100)

# Rerank to find the best 10
reranked = rerank(query, candidates, top_n=10)

# Much better precision than vector search alone
return reranked
Improve RAG by reranking retrieved context:
# Retrieve potentially relevant chunks
chunks = retrieve_chunks(question, limit=20)

# Rerank to find the most relevant
reranked = rerank(question, chunks, top_n=5)

# Use top chunks as context
context = "\n".join([r.text for r in reranked])

response = chat(
    messages=[
        {"role": "system", "content": f"Context: {context}"},
        {"role": "user", "content": question}
    ]
)
Get precise relevance scores for document pairs:
# Score relevance of specific documents
results = rerank(
    query="API documentation",
    documents=[
        "REST API guide for developers",
        "Company holiday schedule",
        "API authentication methods"
    ]
)

# Use scores for filtering or display
relevant = [r for r in results if r.relevance_score > 0.5]

Best Practices

Two-Stage Retrieval

Use fast retrieval first, then rerank the top candidates

Limit Candidate Size

Rerank 50-100 candidates for best speed/quality tradeoff

Use for RAG

Rerank retrieved chunks before feeding to LLM

Score Thresholds

Filter results below a relevance threshold for quality

Performance Tips

CandidatesLatencyRecommendation
10-20~100msGood for real-time
50-100~300msBest quality/speed
100+500ms+Consider batching

Error Responses

{
  "error": {
    "message": "Too many documents. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_documents"
  }
}
{
  "error": {
    "message": "Query cannot be empty",
    "type": "invalid_request_error",
    "code": "empty_query"
  }
}