Skip to main content

API Overview

The Assisters API is fully compatible with the OpenAI API specification, making it easy to migrate existing applications or use familiar SDKs.

Base URL

All API requests should be made to:
https://api.assisters.dev/v1

Authentication

Authenticate using your API key in the Authorization header:
Authorization: Bearer ask_your_api_key_here

Get Your API Key

Create an API key in your dashboard

Request Format

All requests should:
  • Use HTTPS
  • Include Content-Type: application/json header
  • Send JSON-encoded request bodies
curl https://api.assisters.dev/v1/chat/completions \
  -H "Authorization: Bearer ask_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response Format

All responses are JSON-encoded and follow the OpenAI response format:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [...],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Available Endpoints

EndpointMethodDescription
/v1/chat/completionsPOSTGenerate chat completions
/v1/embeddingsPOSTCreate text embeddings
/v1/moderatePOSTContent moderation
/v1/rerankPOSTDocument reranking
/v1/modelsGETList available models

Response Headers

Every response includes useful headers:
HeaderDescription
X-Request-IDUnique request identifier for debugging
X-Processing-Time-MsRequest processing time in milliseconds
X-RateLimit-Limit-RPMYour requests per minute limit
X-RateLimit-Remaining-RPMRemaining requests this minute
X-RateLimit-Limit-TPMYour tokens per minute limit
X-RateLimit-Remaining-TPMRemaining tokens this minute

Rate Limits

Rate limits depend on your subscription tier:
TierRPMTPM
Free10100,000
Developer1001,000,000
Startup5005,000,000
EnterpriseCustomCustom
When you hit a rate limit, you’ll receive a 429 Too Many Requests response with a Retry-After header.

Streaming

For chat completions, you can enable streaming for real-time responses:
{
  "model": "llama-3.1-8b",
  "messages": [...],
  "stream": true
}
Streaming responses use Server-Sent Events (SSE):
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Idempotency

For POST requests, you can include an Idempotency-Key header to safely retry failed requests:
curl https://api.assisters.dev/v1/chat/completions \
  -H "Authorization: Bearer ask_your_api_key" \
  -H "Idempotency-Key: unique-request-id-123" \
  -H "Content-Type: application/json" \
  -d '{...}'

SDKs

Use the official OpenAI SDK with our base URL:
from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

Versioning

The API is currently at version v1. We follow semantic versioning and will communicate any breaking changes well in advance.

View Changelog

See the latest API updates and changes