Skip to main content
POST
https://api.assisters.dev
/
v1
/
embeddings
Embeddings
curl --request POST \
  --url https://api.assisters.dev/v1/embeddings \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": {},
  "encoding_format": "<string>"
}
'
{
  "object": "<string>",
  "data": [
    {}
  ],
  "model": "<string>",
  "usage": {}
}

Embeddings

Create vector representations of text for semantic search, clustering, recommendations, and similarity matching.

Endpoint

POST https://api.assisters.dev/v1/embeddings

Request Body

model
string
required
The embedding model to use. See available models.Examples: e5-large-v2, bge-base-en, jina-embeddings-v2
input
string | array
required
The text to embed. Can be a single string or an array of up to 100 strings.
encoding_format
string
default:"float"
The format for the embedding values. Options: float, base64

Request Examples

Single Text

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.embeddings.create(
    model="e5-large-v2",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Batch Embeddings

response = client.embeddings.create(
    model="e5-large-v2",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

Semantic Search Example

import numpy as np
from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

# Your documents
documents = [
    "Python is a programming language",
    "JavaScript runs in the browser",
    "Machine learning uses algorithms",
    "Cats are furry pets"
]

# Embed all documents
doc_response = client.embeddings.create(
    model="e5-large-v2",
    input=documents
)
doc_embeddings = [d.embedding for d in doc_response.data]

# Embed the query
query = "What programming languages are there?"
query_response = client.embeddings.create(
    model="e5-large-v2",
    input=query
)
query_embedding = query_response.data[0].embedding

# Calculate cosine similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Find most similar documents
similarities = [cosine_similarity(query_embedding, doc) for doc in doc_embeddings]
ranked = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)

for doc, score in ranked:
    print(f"{score:.4f}: {doc}")

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.015797347,
        ...
      ]
    }
  ],
  "model": "e5-large-v2",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

Response Fields

object
string
Always list
data
array
Array of embedding objects, one per input text:
  • object: Always embedding
  • index: Position in the input array
  • embedding: Array of floats representing the vector
model
string
The model used to generate embeddings
usage
object
Token usage for billing:
  • prompt_tokens: Tokens in the input
  • total_tokens: Same as prompt_tokens for embeddings

Available Models

ModelDimensionsMax TokensPrice
e5-large-v21024512$0.01/M
bge-base-en768512$0.01/M
jina-embeddings-v27688192$0.02/M
nomic-embed-text7688192$0.01/M
gte-large1024512$0.01/M

Compare All Models

See detailed specifications for each embedding model

Use Cases

Use embeddings to retrieve relevant context before generating responses with a chat model.
# 1. Embed the user's question
query_embedding = embed(question)

# 2. Find relevant documents
relevant_docs = vector_search(query_embedding)

# 3. Generate answer with context
response = chat_completion(
    messages=[
        {"role": "system", "content": f"Context: {relevant_docs}"},
        {"role": "user", "content": question}
    ]
)
Group similar texts together by clustering their embeddings.
from sklearn.cluster import KMeans

embeddings = embed_documents(documents)
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)
Find and remove duplicate or near-duplicate content by comparing embedding similarity.
threshold = 0.95
unique_docs = []

for doc, embedding in zip(documents, embeddings):
    is_duplicate = any(
        cosine_similarity(embedding, e) > threshold
        for e in unique_embeddings
    )
    if not is_duplicate:
        unique_docs.append(doc)

Best Practices

Batch Requests

Embed multiple texts in one request for better throughput

Cache Embeddings

Store embeddings in a vector database to avoid re-computation

Normalize Vectors

Most embedding models output normalized vectors, but verify for your use case

Choose the Right Model

Larger models are more accurate but slower and more expensive

Vector Databases

Store and query embeddings efficiently with these compatible databases:
  • Pinecone - Managed vector database
  • Weaviate - Open-source vector search
  • Qdrant - Open-source vector database
  • Milvus - Cloud-native vector database
  • pgvector - PostgreSQL extension

Error Responses

{
  "error": {
    "message": "Too many inputs. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_inputs"
  }
}
{
  "error": {
    "message": "Input exceeds maximum token limit for this model",
    "type": "invalid_request_error",
    "code": "input_too_long"
  }
}