Best Practices

Follow these guidelines to build reliable, efficient, and cost-effective applications with Assisters API.

API Key Security

Use Environment Variables

Never hardcode API keys in source code

Rotate Regularly

Create new keys and revoke old ones periodically

Separate Environments

Use different keys for dev, staging, and production

Restrict Domains

Set allowed domains for client-side usage

# Good: Environment variable
import os
api_key = os.environ["ASSISTERS_API_KEY"]

# Bad: Hardcoded
api_key = "ask_abc123..."  # Never do this!

Error Handling

Always handle errors gracefully:

from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")

def safe_completion(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="llama-3.1-8b",
                messages=messages
            )

        except AuthenticationError:
            # Don't retry auth errors
            raise

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = int(e.response.headers.get("Retry-After", 5))
            time.sleep(wait)

        except APIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

Prompt Engineering

Use System Messages

Set consistent behavior with system messages:

messages = [
    {
        "role": "system",
        "content": """You are a helpful customer support agent for TechCorp.
        - Be friendly and professional
        - Only answer questions about our products
        - If unsure, say you'll escalate to a human"""
    },
    {"role": "user", "content": user_question}
]

Be Specific

# Vague (unpredictable results)
"Summarize this"

# Specific (better results)
"Summarize this article in 3 bullet points, each under 20 words"

Provide Examples

messages = [
    {
        "role": "system",
        "content": """Extract entities from text. Format as JSON.

Example:
Input: "John Smith called from New York about order #12345"
Output: {"person": "John Smith", "location": "New York", "order_id": "12345"}"""
    },
    {"role": "user", "content": user_input}
]

Performance Optimization

Enable Streaming

For better UX in chat applications:

stream = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Batch Requests

For embeddings and moderation, batch multiple inputs:

# Efficient: Single request with multiple inputs
response = client.embeddings.create(
    model="e5-large-v2",
    input=["text1", "text2", "text3", ...]  # Up to 100
)

# Inefficient: Multiple requests
for text in texts:
    response = client.embeddings.create(model="e5-large-v2", input=text)

Cache Results

Don’t re-request the same data:

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def get_embedding(text_hash):
    # Fetch from API
    pass

def embed(text):
    text_hash = hashlib.md5(text.encode()).hexdigest()
    return get_embedding(text_hash)

Cost Management

Set Token Limits

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages,
    max_tokens=500  # Prevent runaway responses
)

Choose the Right Model

Task	Recommended Model	Why
Simple Q&A	`phi-3-mini`	Cheapest, fastest
General chat	`llama-3.1-8b`	Best value
Complex reasoning	`llama-3.1-70b`	Highest quality

Monitor Usage

# Track usage after each request
response = client.chat.completions.create(...)

print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.10 / 1_000_000:.6f}")

Reliability

Implement Timeouts

from openai import OpenAI

client = OpenAI(
    api_key="ask_...",
    base_url="https://api.assisters.dev/v1",
    timeout=30.0  # 30 second timeout
)

Use Idempotency Keys

For critical operations:

import uuid

response = requests.post(
    "https://api.assisters.dev/v1/chat/completions",
    headers={
        "Authorization": "Bearer ask_...",
        "Idempotency-Key": str(uuid.uuid4())
    },
    json={...}
)

Implement Circuit Breakers

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = None
        self.state = "closed"

    def call(self, func):
        if self.state == "open":
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker is open")

        try:
            result = func()
            self.failures = 0
            self.state = "closed"
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

Content Safety

Moderate Inputs

def safe_chat(user_message):
    # Check user input
    moderation = client.moderations.create(
        model="llama-guard-3",
        input=user_message
    )

    if moderation.results[0].flagged:
        return "I can't respond to that message."

    # Generate response
    response = client.chat.completions.create(
        model="llama-3.1-8b",
        messages=[{"role": "user", "content": user_message}]
    )

    return response.choices[0].message.content

Validate Outputs

def validated_chat(user_message):
    response = client.chat.completions.create(...)
    content = response.choices[0].message.content

    # Check output
    output_mod = client.moderations.create(
        model="llama-guard-3",
        input=content
    )

    if output_mod.results[0].flagged:
        return "I need to rephrase my response."

    return content

Logging & Monitoring

Log Important Data

import logging

logger = logging.getLogger(__name__)

def logged_completion(messages):
    start_time = time.time()

    try:
        response = client.chat.completions.create(
            model="llama-3.1-8b",
            messages=messages
        )

        logger.info(
            "API call successful",
            extra={
                "model": "llama-3.1-8b",
                "tokens": response.usage.total_tokens,
                "latency_ms": (time.time() - start_time) * 1000
            }
        )

        return response

    except Exception as e:
        logger.error(f"API call failed: {e}")
        raise

Track Metrics

Key metrics to monitor:

Request latency
Token usage
Error rates
Cost per request

Checklist

Security

✅ API keys in environment variables ✅ Keys rotated regularly ✅ Domain restrictions set

Reliability

✅ Error handling with retries ✅ Timeouts configured ✅ Circuit breakers for dependencies

Performance

✅ Streaming enabled for chat ✅ Batching for embeddings ✅ Caching for repeated queries

Cost

✅ Token limits set ✅ Right model for the task ✅ Usage monitoring enabled

Safety

✅ Input moderation ✅ Output validation ✅ Logging for audit

Getting Started

Guides

SDKs

Billing

Security

Best Practices

Best Practices

API Key Security

Use Environment Variables

Rotate Regularly

Separate Environments

Restrict Domains

Error Handling

Prompt Engineering

Use System Messages

Be Specific

Provide Examples

Performance Optimization

Enable Streaming

Batch Requests

Cache Results

Cost Management

Set Token Limits

Choose the Right Model

Monitor Usage

Reliability

Implement Timeouts

Use Idempotency Keys

Implement Circuit Breakers

Content Safety

Moderate Inputs

Validate Outputs

Logging & Monitoring

Log Important Data

Track Metrics

Checklist

Getting Started

Guides

SDKs

Billing

Security

​Best Practices

​API Key Security

Use Environment Variables

Rotate Regularly

Separate Environments

Restrict Domains

​Error Handling

​Prompt Engineering

​Use System Messages

​Be Specific

​Provide Examples

​Performance Optimization

​Enable Streaming

​Batch Requests

​Cache Results

​Cost Management

​Set Token Limits

​Choose the Right Model

​Monitor Usage

​Reliability

​Implement Timeouts

​Use Idempotency Keys

​Implement Circuit Breakers

​Content Safety

​Moderate Inputs

​Validate Outputs

​Logging & Monitoring

​Log Important Data

​Track Metrics

​Checklist

Best Practices

API Key Security

Error Handling

Prompt Engineering

Use System Messages

Be Specific

Provide Examples

Performance Optimization

Enable Streaming

Batch Requests

Cache Results

Cost Management

Set Token Limits

Choose the Right Model

Monitor Usage

Reliability

Implement Timeouts

Use Idempotency Keys

Implement Circuit Breakers

Content Safety

Moderate Inputs

Validate Outputs

Logging & Monitoring

Log Important Data

Track Metrics

Checklist