Skip to main content

Quickstart

This guide will help you make your first API call in under 5 minutes.

Prerequisites

An Assisters account (sign up free)
An API key from your dashboard
Python 3.8+ or Node.js 18+ installed

Step 1: Get Your API Key

  1. Log in to your Assisters Dashboard
  2. Navigate to API Keys
  3. Click Create New Key
  4. Copy your key (it starts with ask_)
Keep your API key secure! Never expose it in client-side code or commit it to version control.

Step 2: Install the SDK

Since Assisters API is OpenAI-compatible, you can use the official OpenAI SDK:
pip install openai

Step 3: Make Your First Request

from openai import OpenAI

# Initialize the client with Assisters API
client = OpenAI(
    api_key="ask_your_api_key_here",
    base_url="https://api.assisters.dev/v1"
)

# Create a chat completion
response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: The capital of France is Paris.

Step 4: Use Streaming (Optional)

For real-time responses, enable streaming:
from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key_here",
    base_url="https://api.assisters.dev/v1"
)

# Stream the response
stream = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "user", "content": "Write a haiku about coding"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Step 5: Try Other Endpoints

Create Embeddings

response = client.embeddings.create(
    model="e5-large-v2",
    input="The quick brown fox jumps over the lazy dog"
)

print(f"Embedding dimensions: {len(response.data[0].embedding)}")
# Output: Embedding dimensions: 1024

Content Moderation

response = client.moderations.create(
    model="llama-guard-3",
    input="Hello, how are you today?"
)

print(f"Flagged: {response.results[0].flagged}")
# Output: Flagged: False

Understanding the Response

A typical chat completion response looks like this:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}
The usage field shows token consumption, which determines your billing. See token counting for details.

Environment Variables

For production, use environment variables instead of hardcoding your API key:
ASSISTERS_API_KEY=ask_your_api_key_here

Next Steps

Troubleshooting

  • Check that your API key is correct and starts with ask_
  • Ensure the key hasn’t been revoked in your dashboard
  • Verify you’re using the Authorization: Bearer header format
  • You’ve exceeded your tier’s rate limit (RPM or TPM)
  • Wait a moment and retry, or upgrade your plan
  • Check X-RateLimit-* headers for current limits
  • Check the model name matches exactly (case-sensitive)
  • Some models may require a paid plan
  • See available models for the full list