Skip to main content

Usage Tracking

Monitor your token consumption, track costs, and optimize your API usage.

Dashboard Overview

View your usage at assisters.dev/dashboard/usage:
  • Current Month Usage: Tokens used vs. limit
  • Usage by Model: Breakdown by model type
  • Daily Trends: Usage patterns over time
  • Top Endpoints: Most used API endpoints
  • Cost Breakdown: Detailed billing information

Usage in API Responses

Every API response includes usage information:
{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 100,
    "total_tokens": 125
  }
}
FieldDescription
prompt_tokensTokens in your input
completion_tokensTokens in the response
total_tokensTotal tokens (billed)

Tracking Usage in Code

Python Example

from openai import OpenAI

client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")

class UsageTracker:
    def __init__(self):
        self.total_tokens = 0
        self.total_cost = 0
        self.requests = 0

    def record(self, usage, price_per_million=0.10):
        self.total_tokens += usage.total_tokens
        self.total_cost += usage.total_tokens * price_per_million / 1_000_000
        self.requests += 1

    def report(self):
        return {
            "total_requests": self.requests,
            "total_tokens": self.total_tokens,
            "total_cost": f"${self.total_cost:.4f}",
            "avg_tokens_per_request": self.total_tokens / max(self.requests, 1)
        }

tracker = UsageTracker()

# Track each request
response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[{"role": "user", "content": "Hello!"}]
)
tracker.record(response.usage)

print(tracker.report())

JavaScript Example

class UsageTracker {
  constructor() {
    this.totalTokens = 0;
    this.totalCost = 0;
    this.requests = 0;
  }

  record(usage, pricePerMillion = 0.10) {
    this.totalTokens += usage.total_tokens;
    this.totalCost += usage.total_tokens * pricePerMillion / 1_000_000;
    this.requests += 1;
  }

  report() {
    return {
      totalRequests: this.requests,
      totalTokens: this.totalTokens,
      totalCost: `$${this.totalCost.toFixed(4)}`,
      avgTokensPerRequest: this.totalTokens / Math.max(this.requests, 1)
    };
  }
}

const tracker = new UsageTracker();

const response = await client.chat.completions.create({
  model: 'llama-3.1-8b',
  messages: [{ role: 'user', content: 'Hello!' }]
});

tracker.record(response.usage);
console.log(tracker.report());

Rate Limit Headers

Monitor your rate limits in response headers:
X-RateLimit-Limit-RPM: 100
X-RateLimit-Remaining-RPM: 95
X-RateLimit-Reset-RPM: 1706745660

X-RateLimit-Limit-TPM: 1000000
X-RateLimit-Remaining-TPM: 995000
X-RateLimit-Reset-TPM: 1706745660
HeaderDescription
X-RateLimit-Limit-*Your current limit
X-RateLimit-Remaining-*Remaining quota
X-RateLimit-Reset-*Unix timestamp when quota resets

Usage Alerts

Set up alerts in your dashboard to get notified when:
  • You reach 80% of your monthly tokens
  • You exceed rate limits frequently
  • Unusual usage patterns are detected

Estimating Future Usage

Before Sending Requests

import tiktoken

def estimate_tokens(text):
    encoding = tiktoken.get_encoding("cl100k_base")
    return len(encoding.encode(text))

def estimate_request_cost(messages, max_output_tokens=500, price_per_million=0.10):
    input_tokens = sum(estimate_tokens(m["content"]) for m in messages)
    input_tokens += len(messages) * 4  # Message overhead

    # Estimate total (input + expected output)
    estimated_total = input_tokens + max_output_tokens

    cost = estimated_total * price_per_million / 1_000_000
    return {
        "estimated_input_tokens": input_tokens,
        "max_output_tokens": max_output_tokens,
        "estimated_cost": f"${cost:.6f}"
    }

# Example
messages = [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Explain quantum computing."}
]

print(estimate_request_cost(messages))

Usage by Model

Track usage separately by model for optimization:
from collections import defaultdict

usage_by_model = defaultdict(lambda: {"tokens": 0, "cost": 0, "requests": 0})

MODEL_PRICES = {
    "llama-3.1-8b": 0.10,
    "llama-3.1-70b": 0.90,
    "e5-large-v2": 0.01,
}

def track_by_model(model, usage):
    price = MODEL_PRICES.get(model, 0.10)
    usage_by_model[model]["tokens"] += usage.total_tokens
    usage_by_model[model]["cost"] += usage.total_tokens * price / 1_000_000
    usage_by_model[model]["requests"] += 1

# After each request
track_by_model("llama-3.1-8b", response.usage)

# Report
for model, data in usage_by_model.items():
    print(f"{model}: {data['tokens']} tokens, ${data['cost']:.4f}")

Optimizing Usage

Choose Efficient Models

Use smaller models for simple tasks (phi-3-mini vs llama-3.1-70b)

Limit Output Tokens

Set max_tokens to prevent runaway responses

Trim Context

Remove old messages from conversation history

Cache Responses

Cache repeated queries to avoid re-computation

Monthly Limits

Your subscription includes a monthly token allocation:
TierMonthly TokensOverage Rate
Free100KNo overage allowed
Developer5MModel prices
Startup25MModel prices
EnterpriseUnlimitedN/A
Free tier users cannot exceed their monthly limit. Upgrade to enable pay-as-you-go overage.

API for Usage Data

Check your usage programmatically (coming soon):
curl https://api.assisters.dev/v1/usage \
  -H "Authorization: Bearer ask_your_api_key"

Dashboard Features

Usage Charts

Visual breakdown of token consumption over time

Export Data

Export usage data as CSV for accounting

Team Usage

View usage by team member (Enterprise)

Budget Alerts

Get notified before hitting limits

View Your Usage

Check your current usage and remaining tokens