Skip to main content

Best Practices

Follow these guidelines to build reliable, efficient, and cost-effective applications with Assisters API.

API Key Security

Use Environment Variables

Never hardcode API keys in source code

Rotate Regularly

Create new keys and revoke old ones periodically

Separate Environments

Use different keys for dev, staging, and production

Restrict Domains

Set allowed domains for client-side usage
# Good: Environment variable
import os
api_key = os.environ["ASSISTERS_API_KEY"]

# Bad: Hardcoded
api_key = "ask_abc123..."  # Never do this!

Error Handling

Always handle errors gracefully:
from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")

def safe_completion(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="llama-3.1-8b",
                messages=messages
            )

        except AuthenticationError:
            # Don't retry auth errors
            raise

        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = int(e.response.headers.get("Retry-After", 5))
            time.sleep(wait)

        except APIError as e:
            if e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

Prompt Engineering

Use System Messages

Set consistent behavior with system messages:
messages = [
    {
        "role": "system",
        "content": """You are a helpful customer support agent for TechCorp.
        - Be friendly and professional
        - Only answer questions about our products
        - If unsure, say you'll escalate to a human"""
    },
    {"role": "user", "content": user_question}
]

Be Specific

# Vague (unpredictable results)
"Summarize this"

# Specific (better results)
"Summarize this article in 3 bullet points, each under 20 words"

Provide Examples

messages = [
    {
        "role": "system",
        "content": """Extract entities from text. Format as JSON.

Example:
Input: "John Smith called from New York about order #12345"
Output: {"person": "John Smith", "location": "New York", "order_id": "12345"}"""
    },
    {"role": "user", "content": user_input}
]

Performance Optimization

Enable Streaming

For better UX in chat applications:
stream = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Batch Requests

For embeddings and moderation, batch multiple inputs:
# Efficient: Single request with multiple inputs
response = client.embeddings.create(
    model="e5-large-v2",
    input=["text1", "text2", "text3", ...]  # Up to 100
)

# Inefficient: Multiple requests
for text in texts:
    response = client.embeddings.create(model="e5-large-v2", input=text)

Cache Results

Don’t re-request the same data:
from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def get_embedding(text_hash):
    # Fetch from API
    pass

def embed(text):
    text_hash = hashlib.md5(text.encode()).hexdigest()
    return get_embedding(text_hash)

Cost Management

Set Token Limits

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=messages,
    max_tokens=500  # Prevent runaway responses
)

Choose the Right Model

TaskRecommended ModelWhy
Simple Q&Aphi-3-miniCheapest, fastest
General chatllama-3.1-8bBest value
Complex reasoningllama-3.1-70bHighest quality

Monitor Usage

# Track usage after each request
response = client.chat.completions.create(...)

print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.10 / 1_000_000:.6f}")

Reliability

Implement Timeouts

from openai import OpenAI

client = OpenAI(
    api_key="ask_...",
    base_url="https://api.assisters.dev/v1",
    timeout=30.0  # 30 second timeout
)

Use Idempotency Keys

For critical operations:
import uuid

response = requests.post(
    "https://api.assisters.dev/v1/chat/completions",
    headers={
        "Authorization": "Bearer ask_...",
        "Idempotency-Key": str(uuid.uuid4())
    },
    json={...}
)

Implement Circuit Breakers

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = None
        self.state = "closed"

    def call(self, func):
        if self.state == "open":
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker is open")

        try:
            result = func()
            self.failures = 0
            self.state = "closed"
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

Content Safety

Moderate Inputs

def safe_chat(user_message):
    # Check user input
    moderation = client.moderations.create(
        model="llama-guard-3",
        input=user_message
    )

    if moderation.results[0].flagged:
        return "I can't respond to that message."

    # Generate response
    response = client.chat.completions.create(
        model="llama-3.1-8b",
        messages=[{"role": "user", "content": user_message}]
    )

    return response.choices[0].message.content

Validate Outputs

def validated_chat(user_message):
    response = client.chat.completions.create(...)
    content = response.choices[0].message.content

    # Check output
    output_mod = client.moderations.create(
        model="llama-guard-3",
        input=content
    )

    if output_mod.results[0].flagged:
        return "I need to rephrase my response."

    return content

Logging & Monitoring

Log Important Data

import logging

logger = logging.getLogger(__name__)

def logged_completion(messages):
    start_time = time.time()

    try:
        response = client.chat.completions.create(
            model="llama-3.1-8b",
            messages=messages
        )

        logger.info(
            "API call successful",
            extra={
                "model": "llama-3.1-8b",
                "tokens": response.usage.total_tokens,
                "latency_ms": (time.time() - start_time) * 1000
            }
        )

        return response

    except Exception as e:
        logger.error(f"API call failed: {e}")
        raise

Track Metrics

Key metrics to monitor:
  • Request latency
  • Token usage
  • Error rates
  • Cost per request

Checklist

1

Security

✅ API keys in environment variables ✅ Keys rotated regularly ✅ Domain restrictions set
2

Reliability

✅ Error handling with retries ✅ Timeouts configured ✅ Circuit breakers for dependencies
3

Performance

✅ Streaming enabled for chat ✅ Batching for embeddings ✅ Caching for repeated queries
4

Cost

✅ Token limits set ✅ Right model for the task ✅ Usage monitoring enabled
5

Safety

✅ Input moderation ✅ Output validation ✅ Logging for audit