Skip to main content

Reranking Models

Improve search quality by reranking documents based on relevance to a query. Use these as a second-stage ranker after initial retrieval.

Available Models

BGE Reranker v2

BAAI’s high-quality cross-encoder reranker with excellent accuracy.
SpecificationValue
ProviderBAAI
TypeCross-encoder
Max Tokens512
Price$0.05 / million tokens
Latency~50ms per 10 docs
Best for:
  • High-accuracy requirements
  • RAG pipelines
  • Production search systems
  • Short to medium documents
response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={"Authorization": "Bearer ask_your_key"},
    json={
        "model": "bge-reranker-v2",
        "query": "machine learning basics",
        "documents": [
            "ML is a subset of AI",
            "Weather forecast for today",
            "Deep learning uses neural networks"
        ],
        "top_n": 2
    }
)

Jina Reranker

Jina AI’s reranker with extended context support for longer documents.
SpecificationValue
ProviderJina AI
TypeCross-encoder
Max Tokens8192
Price$0.08 / million tokens
Latency~80ms per 10 docs
Best for:
  • Long documents
  • Full-page reranking
  • Academic papers
  • Legal documents
response = requests.post(
    "https://api.assisters.dev/v1/rerank",
    headers={"Authorization": "Bearer ask_your_key"},
    json={
        "model": "jina-reranker",
        "query": "contract termination clause",
        "documents": long_legal_documents,
        "top_n": 5
    }
)

Model Comparison

FeatureBGE Reranker v2Jina Reranker
Accuracy★★★★★★★★★☆
Speed★★★★★★★★★☆
Max Tokens5128192
Price$0.05/M$0.08/M
Best ForShort docsLong docs

Benchmark Results

BEIR (Benchmark for Information Retrieval)

ModelnDCG@10Recall@100
bge-reranker-v254.278.3
jina-reranker52.876.1

How Reranking Works

  1. Initial Retrieval: Fast search (vector/keyword) returns ~100 candidates
  2. Reranking: Cross-encoder scores each query-document pair precisely
  3. Final Results: Return the top N most relevant documents

Use Cases

Improve context quality for LLM responses:
def enhanced_rag(question):
    # Retrieve candidates
    chunks = retrieve_chunks(question, limit=20)

    # Rerank for relevance
    reranked = rerank(
        model="bge-reranker-v2",
        query=question,
        documents=[c.text for c in chunks],
        top_n=5
    )

    # Use best chunks as context
    context = "\n".join([r.document.text for r in reranked])

    # Generate answer
    return generate_with_context(question, context)
Get precise relevance scores:
def score_relevance(query, documents):
    results = rerank(
        model="bge-reranker-v2",
        query=query,
        documents=documents
    )

    # Use scores for filtering
    high_relevance = [
        r for r in results
        if r.relevance_score > 0.7
    ]

    return high_relevance

Best Practices

Optimal Candidate Size

Rerank 50-100 candidates for best speed/quality tradeoff

Use for RAG

Reranking before LLM context significantly improves answer quality

Score Thresholds

Filter by relevance_score > 0.5 to remove noise

Match Document Length

Use Jina for long docs, BGE for short docs

Performance Tips

CandidatesBGE LatencyJina Latency
10~50ms~80ms
50~150ms~250ms
100~300ms~500ms
Reranking is O(n) with document count. For real-time applications, limit candidates to 50-100.

Choosing a Model

Integration Example

Complete two-stage retrieval system:
from openai import OpenAI
import requests

client = OpenAI(
    api_key="ask_your_key",
    base_url="https://api.assisters.dev/v1"
)

def search(query: str, documents: list[str], top_k: int = 5):
    # Stage 1: Embed and retrieve
    query_embedding = client.embeddings.create(
        model="e5-large-v2",
        input=query
    ).data[0].embedding

    doc_embeddings = client.embeddings.create(
        model="e5-large-v2",
        input=documents
    ).data

    # Simple similarity search
    scores = [cosine_sim(query_embedding, d.embedding) for d in doc_embeddings]
    candidates = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)[:20]

    # Stage 2: Rerank
    response = requests.post(
        "https://api.assisters.dev/v1/rerank",
        headers={"Authorization": "Bearer ask_your_key"},
        json={
            "model": "bge-reranker-v2",
            "query": query,
            "documents": [c[0] for c in candidates],
            "top_n": top_k
        }
    )

    return response.json()["results"]