Chat Completions

Generate AI responses for conversational applications. This endpoint is fully compatible with the OpenAI Chat Completions API.

Endpoint

POST https://api.assisters.dev/v1/chat/completions

Request Body

model

string

required

The model to use for completion. See available models.Examples: llama-3.1-8b, llama-3.1-70b, mistral-7b

messages

array

required

An array of messages comprising the conversation so far.Each message object has:

role (string): system, user, or assistant
content (string): The content of the message

stream

boolean

default:"false"

If true, returns a stream of Server-Sent Events (SSE) for real-time responses.

max_tokens

integer

Maximum number of tokens to generate. Defaults to model’s maximum.

temperature

number

default:"1.0"

Sampling temperature between 0 and 2. Higher values make output more random.

top_p

number

default:"1.0"

Nucleus sampling parameter. Use this OR temperature, not both.

stop

string | array

Up to 4 sequences where the API will stop generating tokens.

presence_penalty

number

default:"0"

Penalty for new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.

frequency_penalty

number

default:"0"

Penalty for new tokens based on their frequency in the text. Range: -2.0 to 2.0.

user

string

A unique identifier for the end-user, useful for monitoring and abuse detection.

Request Examples

Basic Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Japan?"}
    ]
)

print(response.choices[0].message.content)

Streaming Request

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

stream = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "user", "content": "Write a short poem about coding"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Multi-turn Conversation

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is 2 + 2?"},
    {"role": "assistant", "content": "2 + 2 equals 4."},
    {"role": "user", "content": "And what is that multiplied by 3?"}
]

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages
)

Response

Non-Streaming Response

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of Japan is Tokyo."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Response

Each chunk in the stream:

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "The"
      },
      "finish_reason": null
    }
  ]
}

Final chunk:

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion.chunk",
  "created": 1706745600,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}

Response Fields

string

Unique identifier for the completion

object

string

Always chat.completion or chat.completion.chunk for streaming

created

integer

Unix timestamp of when the completion was created

model

string

The model used for completion

choices

array

Array of completion choices. Each choice contains:

index: The index of this choice
message: The generated message (non-streaming)
delta: The incremental content (streaming)
finish_reason: Why generation stopped (stop, length, content_filter)

usage

object

Token usage statistics (not included in streaming):

prompt_tokens: Tokens in the input
completion_tokens: Tokens in the output
total_tokens: Total tokens used

Finish Reasons

Reason	Description
`stop`	Natural completion or stop sequence reached
`length`	`max_tokens` limit reached
`content_filter`	Content was filtered by moderation

Error Responses

400 Bad Request

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}

401 Unauthorized

{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

429 Rate Limit

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Best Practices

Use System Messages

Set context and behavior with system messages for consistent responses

Stream Long Responses

Enable streaming for better UX with longer completions

Manage Conversation Length

Trim old messages to stay within token limits

Handle Errors Gracefully

Implement retry logic with exponential backoff

Overview

Endpoints

Chat Completions

Chat Completions

Endpoint

Request Body

Request Examples

Basic Request

Streaming Request

Multi-turn Conversation

Response

Non-Streaming Response

Streaming Response

Response Fields

Finish Reasons

Error Responses

Best Practices

Use System Messages

Stream Long Responses

Manage Conversation Length

Handle Errors Gracefully

Overview

Endpoints

​Chat Completions

​Endpoint

​Request Body

​Request Examples

​Basic Request

​Streaming Request

​Multi-turn Conversation

​Response

​Non-Streaming Response

​Streaming Response

​Response Fields

​Finish Reasons

​Error Responses

​Best Practices

Use System Messages

Stream Long Responses

Manage Conversation Length

Handle Errors Gracefully

Chat Completions

Endpoint

Request Body

Request Examples

Basic Request

Streaming Request

Multi-turn Conversation

Response

Non-Streaming Response

Streaming Response

Response Fields

Finish Reasons

Error Responses

Best Practices