Skip to main content

Embedding Models

Create vector representations of text for semantic search, clustering, recommendations, and RAG applications.

Available Models

BAAI’s state-of-the-art multilingual embedding model supporting 100+ languages. FREE via HuggingFace.
SpecificationValue
ProviderBAAI (via HuggingFace)
Dimensions1024
Max Tokens8192
PriceFREE
Similarity MetricCosine
Best for:
  • Multilingual semantic search
  • Production RAG systems
  • Cross-lingual retrieval
  • Cost-free deployment
response = client.embeddings.create(
    model="bge-m3",
    input="The quick brown fox jumps over the lazy dog"
)
# Returns 1024-dimensional vector
Free Model: BGE-M3 is our recommended model for all embedding use cases. It offers superior multilingual support and quality at zero cost.

E5-large-v2

Microsoft’s flagship text embedding model with state-of-the-art performance.
SpecificationValue
ProviderMicrosoft
Dimensions1024
Max Tokens512
Price$0.01 / million tokens
Similarity MetricCosine
Best for:
  • Semantic search
  • Document retrieval
  • Question answering
  • High-accuracy requirements
response = client.embeddings.create(
    model="e5-large-v2",
    input="The quick brown fox jumps over the lazy dog"
)
# Returns 1024-dimensional vector

BGE-base-en

BAAI’s balanced embedding model with excellent English performance.
SpecificationValue
ProviderBAAI
Dimensions768
Max Tokens512
Price$0.01 / million tokens
Similarity MetricCosine
Best for:
  • Cost-effective search
  • English-only applications
  • RAG systems
  • Production deployments
response = client.embeddings.create(
    model="bge-base-en",
    input="Machine learning is transforming industries"
)
# Returns 768-dimensional vector

Jina Embeddings v2

Jina AI’s long-context embedding model for entire documents.
SpecificationValue
ProviderJina AI
Dimensions768
Max Tokens8192
Price$0.02 / million tokens
Similarity MetricCosine
Best for:
  • Long documents
  • Full-page embeddings
  • Reduced chunking needs
  • Document comparison
response = client.embeddings.create(
    model="jina-embeddings-v2",
    input=long_document  # Up to 8192 tokens
)
# Returns 768-dimensional vector

Nomic Embed Text

Nomic AI’s efficient embedding model with long context support.
SpecificationValue
ProviderNomic AI
Dimensions768
Max Tokens8192
Price$0.01 / million tokens
Similarity MetricCosine
Best for:
  • Long-context on budget
  • Open-source preference
  • General-purpose search
  • Academic applications
response = client.embeddings.create(
    model="nomic-embed-text",
    input="Analyze this research paper..."
)
# Returns 768-dimensional vector

GTE-large

Alibaba’s general text embeddings model with high dimensionality.
SpecificationValue
ProviderAlibaba
Dimensions1024
Max Tokens512
Price$0.01 / million tokens
Similarity MetricCosine
Best for:
  • High-dimensional search
  • Multilingual content
  • Cross-lingual retrieval
  • Asian language content
response = client.embeddings.create(
    model="gte-large",
    input="这是一个中文文本示例"
)
# Returns 1024-dimensional vector

Model Comparison

ModelDimensionsMax TokensQualityPrice
bge-m310248192★★★★★FREE
e5-large-v21024512★★★★★$0.01/M
bge-base-en768512★★★★☆$0.01/M
jina-embeddings-v27688192★★★★☆$0.02/M
nomic-embed-text7688192★★★☆☆$0.01/M
gte-large1024512★★★★☆$0.01/M

Benchmark Results

MTEB (Massive Text Embedding Benchmark)

ModelAverage ScoreRetrievalSTSPrice
bge-m366.158.286.4FREE
e5-large-v264.256.885.6$0.01/M
bge-base-en63.455.284.1$0.01/M
gte-large63.154.983.7$0.01/M
jina-embeddings-v262.854.382.9$0.02/M
nomic-embed-text61.553.181.4$0.01/M

Use Cases

Retrieve relevant context for LLM responses:
# 1. Embed the question
q_embedding = embed("What is the return policy?")

# 2. Find relevant docs from vector DB
relevant_docs = vector_db.search(q_embedding, top_k=5)

# 3. Generate answer with context
response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "system", "content": f"Context: {relevant_docs}"},
        {"role": "user", "content": "What is the return policy?"}
    ]
)
Group similar content together:
from sklearn.cluster import KMeans

# Embed all documents
embeddings = [embed(doc) for doc in documents]

# Cluster
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)

# Group documents by cluster
for i, (doc, cluster) in enumerate(zip(documents, clusters)):
    print(f"Cluster {cluster}: {doc[:50]}...")
Find and remove duplicate content:
threshold = 0.95

def find_duplicates(documents):
    embeddings = [embed(doc) for doc in documents]
    duplicates = []

    for i in range(len(embeddings)):
        for j in range(i + 1, len(embeddings)):
            sim = cosine_similarity(embeddings[i], embeddings[j])
            if sim > threshold:
                duplicates.append((i, j, sim))

    return duplicates

Best Practices

Batch Requests

Embed multiple texts in one request for better throughput

Cache Embeddings

Store embeddings to avoid recomputing for the same text

Normalize Vectors

Most models output normalized vectors; verify for your use case

Match Query/Doc Models

Use the same model for queries and documents

Vector Databases

Store and search embeddings efficiently:
DatabaseTypeFeatures
PineconeManagedFast, scalable, serverless
WeaviateSelf-hostedOpen-source, hybrid search
QdrantSelf-hostedRust-based, efficient
MilvusSelf-hostedDistributed, GPU support
pgvectorExtensionPostgreSQL integration

Choosing a Model

Recommendation: Start with bge-m3 for most use cases. It’s free, supports 100+ languages, handles long documents (8192 tokens), and offers top-tier quality.