Skip to main content
POST
https://api.assisters.dev
/
v1
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://api.assisters.dev/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "stream": true,
  "max_tokens": 123,
  "temperature": 123,
  "top_p": 123,
  "stop": {},
  "presence_penalty": 123,
  "frequency_penalty": 123,
  "user": "<string>"
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {}
  ],
  "usage": {}
}

Chat Completions

Generate AI responses for conversational applications. This endpoint is fully compatible with the OpenAI Chat Completions API.

Endpoint

POST https://api.assisters.dev/v1/chat/completions

Request Body

model
string
required
The model to use for completion. See available models.Examples: llama-3.1-8b, llama-3.1-70b, mistral-7b
messages
array
required
An array of messages comprising the conversation so far.Each message object has:
  • role (string): system, user, or assistant
  • content (string): The content of the message
stream
boolean
default:"false"
If true, returns a stream of Server-Sent Events (SSE) for real-time responses.
max_tokens
integer
Maximum number of tokens to generate. Defaults to model’s maximum.
temperature
number
default:"1.0"
Sampling temperature between 0 and 2. Higher values make output more random.
top_p
number
default:"1.0"
Nucleus sampling parameter. Use this OR temperature, not both.
stop
string | array
Up to 4 sequences where the API will stop generating tokens.
presence_penalty
number
default:"0"
Penalty for new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.
frequency_penalty
number
default:"0"
Penalty for new tokens based on their frequency in the text. Range: -2.0 to 2.0.
user
string
A unique identifier for the end-user, useful for monitoring and abuse detection.

Request Examples

Basic Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Japan?"}
    ]
)

print(response.choices[0].message.content)

Streaming Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

stream = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "user", "content": "Write a short poem about coding"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Multi-turn Conversation

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is 2 + 2?"},
    {"role": "assistant", "content": "2 + 2 equals 4."},
    {"role": "user", "content": "And what is that multiplied by 3?"}
]

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages
)

Response

Non-Streaming Response

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Japan is Tokyo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Response

Each chunk in the stream:
{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "The"
      },
      "finish_reason": null
    }
  ]
}
Final chunk:
{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}

Response Fields

id
string
Unique identifier for the completion
object
string
Always chat.completion or chat.completion.chunk for streaming
created
integer
Unix timestamp of when the completion was created
model
string
The model used for completion
choices
array
Array of completion choices. Each choice contains:
  • index: The index of this choice
  • message: The generated message (non-streaming)
  • delta: The incremental content (streaming)
  • finish_reason: Why generation stopped (stop, length, content_filter)
usage
object
Token usage statistics (not included in streaming):
  • prompt_tokens: Tokens in the input
  • completion_tokens: Tokens in the output
  • total_tokens: Total tokens used

Finish Reasons

ReasonDescription
stopNatural completion or stop sequence reached
lengthmax_tokens limit reached
content_filterContent was filtered by moderation

Error Responses

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}
{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Best Practices

Use System Messages

Set context and behavior with system messages for consistent responses

Stream Long Responses

Enable streaming for better UX with longer completions

Manage Conversation Length

Trim old messages to stay within token limits

Handle Errors Gracefully

Implement retry logic with exponential backoff