Embeddings

Create vector representations of text for semantic search, clustering, recommendations, and similarity matching.

Endpoint

POST https://api.assisters.dev/v1/embeddings

Request Body

model

string

required

The embedding model to use. See available models.Examples: e5-large-v2, bge-base-en, jina-embeddings-v2

input

string | array

required

The text to embed. Can be a single string or an array of up to 100 strings.

encoding_format

string

default:"float"

The format for the embedding values. Options: float, base64

Request Examples

Single Text

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.embeddings.create(
    model="e5-large-v2",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Batch Embeddings

response = client.embeddings.create(
    model="e5-large-v2",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

Semantic Search Example

import numpy as np
from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

# Your documents
documents = [
    "Python is a programming language",
    "JavaScript runs in the browser",
    "Machine learning uses algorithms",
    "Cats are furry pets"
]

# Embed all documents
doc_response = client.embeddings.create(
    model="e5-large-v2",
    input=documents
)
doc_embeddings = [d.embedding for d in doc_response.data]

# Embed the query
query = "What programming languages are there?"
query_response = client.embeddings.create(
    model="e5-large-v2",
    input=query
)
query_embedding = query_response.data[0].embedding

# Calculate cosine similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Find most similar documents
similarities = [cosine_similarity(query_embedding, doc) for doc in doc_embeddings]
ranked = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)

for doc, score in ranked:
    print(f"{score:.4f}: {doc}")

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.015797347,
        ...
      ]
    }
  ],
  "model": "e5-large-v2",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

Response Fields

object

string

Always list

data

array

Array of embedding objects, one per input text:

object: Always embedding
index: Position in the input array
embedding: Array of floats representing the vector

model

string

The model used to generate embeddings

usage

object

Token usage for billing:

prompt_tokens: Tokens in the input
total_tokens: Same as prompt_tokens for embeddings

Available Models

Model	Dimensions	Max Tokens	Price
`e5-large-v2`	1024	512	$0.01/M
`bge-base-en`	768	512	$0.01/M
`jina-embeddings-v2`	768	8192	$0.02/M
`nomic-embed-text`	768	8192	$0.01/M
`gte-large`	1024	512	$0.01/M

Compare All Models

See detailed specifications for each embedding model

Use Cases

Semantic Search

Convert documents and queries into embeddings, then find documents with the highest cosine similarity to the query.

# Index documents once
doc_embeddings = embed_documents(documents)

# Query in real-time
query_embedding = embed_query("search query")
results = find_similar(query_embedding, doc_embeddings)

RAG (Retrieval-Augmented Generation)

Use embeddings to retrieve relevant context before generating responses with a chat model.

# 1. Embed the user's question
query_embedding = embed(question)

# 2. Find relevant documents
relevant_docs = vector_search(query_embedding)

# 3. Generate answer with context
response = chat_completion(
    messages=[
        {"role": "system", "content": f"Context: {relevant_docs}"},
        {"role": "user", "content": question}
    ]
)

Clustering

Group similar texts together by clustering their embeddings.

from sklearn.cluster import KMeans

embeddings = embed_documents(documents)
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)

Deduplication

Find and remove duplicate or near-duplicate content by comparing embedding similarity.

threshold = 0.95
unique_docs = []

for doc, embedding in zip(documents, embeddings):
    is_duplicate = any(
        cosine_similarity(embedding, e) > threshold
        for e in unique_embeddings
    )
    if not is_duplicate:
        unique_docs.append(doc)

Best Practices

Batch Requests

Embed multiple texts in one request for better throughput

Cache Embeddings

Store embeddings in a vector database to avoid re-computation

Normalize Vectors

Most embedding models output normalized vectors, but verify for your use case

Choose the Right Model

Larger models are more accurate but slower and more expensive

Vector Databases

Store and query embeddings efficiently with these compatible databases:

Pinecone - Managed vector database
Weaviate - Open-source vector search
Qdrant - Open-source vector database
Milvus - Cloud-native vector database
pgvector - PostgreSQL extension

Error Responses

400 Bad Request - Too Many Inputs

{
  "error": {
    "message": "Too many inputs. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_inputs"
  }
}

400 Bad Request - Input Too Long

{
  "error": {
    "message": "Input exceeds maximum token limit for this model",
    "type": "invalid_request_error",
    "code": "input_too_long"
  }
}

Overview

Endpoints

Embeddings

Embeddings

Endpoint

Request Body

Request Examples

Single Text

Batch Embeddings

Semantic Search Example

Response

Response Fields

Available Models

Compare All Models

Use Cases

Best Practices

Batch Requests

Cache Embeddings

Normalize Vectors

Choose the Right Model

Vector Databases

Error Responses

Overview

Endpoints

​Embeddings

​Endpoint

​Request Body

​Request Examples

​Single Text

​Batch Embeddings

​Semantic Search Example

​Response

​Response Fields

​Available Models

Compare All Models

​Use Cases

​Best Practices

Batch Requests

Cache Embeddings

Normalize Vectors

Choose the Right Model

​Vector Databases

​Error Responses

Embeddings

Endpoint

Request Body

Request Examples

Single Text

Batch Embeddings

Semantic Search Example

Response

Response Fields

Available Models

Use Cases

Best Practices

Vector Databases

Error Responses