API Overview

The Assisters API is fully compatible with the OpenAI API specification, making it easy to migrate existing applications or use familiar SDKs.

Base URL

All API requests should be made to:

https://api.assisters.dev/v1

Authentication

Authenticate using your API key in the Authorization header:

Authorization: Bearer ask_your_api_key_here

Get Your API Key

Create an API key in your dashboard

Request Format

All requests should:

Use HTTPS
Include Content-Type: application/json header
Send JSON-encoded request bodies

curl https://api.assisters.dev/v1/chat/completions \
  -H "Authorization: Bearer ask_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response Format

All responses are JSON-encoded and follow the OpenAI response format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [...],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Available Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Generate chat completions
`/v1/embeddings`	POST	Create text embeddings
`/v1/moderate`	POST	Content moderation
`/v1/rerank`	POST	Document reranking
`/v1/models`	GET	List available models

Response Headers

Every response includes useful headers:

Header	Description
`X-Request-ID`	Unique request identifier for debugging
`X-Processing-Time-Ms`	Request processing time in milliseconds
`X-RateLimit-Limit-RPM`	Your requests per minute limit
`X-RateLimit-Remaining-RPM`	Remaining requests this minute
`X-RateLimit-Limit-TPM`	Your tokens per minute limit
`X-RateLimit-Remaining-TPM`	Remaining tokens this minute

Rate Limits

Rate limits depend on your subscription tier:

Tier	RPM	TPM
Free	10	100,000
Developer	100	1,000,000
Startup	500	5,000,000
Enterprise	Custom	Custom

When you hit a rate limit, you’ll receive a 429 Too Many Requests response with a Retry-After header.

Streaming

For chat completions, you can enable streaming for real-time responses:

{
  "model": "llama-3.1-8b",
  "messages": [...],
  "stream": true
}

Streaming responses use Server-Sent Events (SSE):

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Idempotency

For POST requests, you can include an Idempotency-Key header to safely retry failed requests:

curl https://api.assisters.dev/v1/chat/completions \
  -H "Authorization: Bearer ask_your_api_key" \
  -H "Idempotency-Key: unique-request-id-123" \
  -H "Content-Type: application/json" \
  -d '{...}'

SDKs

Use the official OpenAI SDK with our base URL:

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

Versioning

The API is currently at version v1. We follow semantic versioning and will communicate any breaking changes well in advance.

View Changelog

See the latest API updates and changes

Overview

Endpoints

​API Overview

​Base URL

​Authentication

Get Your API Key

​Request Format

​Response Format

​Available Endpoints

​Response Headers

​Rate Limits

​Streaming

​Idempotency

​SDKs

​Versioning