Usage Tracking
Monitor your token consumption, track costs, and optimize your API usage.
Dashboard Overview
View your usage at assisters.dev/dashboard/usage:
- Current Month Usage: Tokens used vs. limit
- Usage by Model: Breakdown by model type
- Daily Trends: Usage patterns over time
- Top Endpoints: Most used API endpoints
- Cost Breakdown: Detailed billing information
Usage in API Responses
Every API response includes usage information:
{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 100,
"total_tokens": 125
}
}
| Field | Description |
|---|
prompt_tokens | Tokens in your input |
completion_tokens | Tokens in the response |
total_tokens | Total tokens (billed) |
Tracking Usage in Code
Python Example
from openai import OpenAI
client = OpenAI(api_key="ask_...", base_url="https://api.assisters.dev/v1")
class UsageTracker:
def __init__(self):
self.total_tokens = 0
self.total_cost = 0
self.requests = 0
def record(self, usage, price_per_million=0.10):
self.total_tokens += usage.total_tokens
self.total_cost += usage.total_tokens * price_per_million / 1_000_000
self.requests += 1
def report(self):
return {
"total_requests": self.requests,
"total_tokens": self.total_tokens,
"total_cost": f"${self.total_cost:.4f}",
"avg_tokens_per_request": self.total_tokens / max(self.requests, 1)
}
tracker = UsageTracker()
# Track each request
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Hello!"}]
)
tracker.record(response.usage)
print(tracker.report())
JavaScript Example
class UsageTracker {
constructor() {
this.totalTokens = 0;
this.totalCost = 0;
this.requests = 0;
}
record(usage, pricePerMillion = 0.10) {
this.totalTokens += usage.total_tokens;
this.totalCost += usage.total_tokens * pricePerMillion / 1_000_000;
this.requests += 1;
}
report() {
return {
totalRequests: this.requests,
totalTokens: this.totalTokens,
totalCost: `$${this.totalCost.toFixed(4)}`,
avgTokensPerRequest: this.totalTokens / Math.max(this.requests, 1)
};
}
}
const tracker = new UsageTracker();
const response = await client.chat.completions.create({
model: 'llama-3.1-8b',
messages: [{ role: 'user', content: 'Hello!' }]
});
tracker.record(response.usage);
console.log(tracker.report());
Monitor your rate limits in response headers:
X-RateLimit-Limit-RPM: 100
X-RateLimit-Remaining-RPM: 95
X-RateLimit-Reset-RPM: 1706745660
X-RateLimit-Limit-TPM: 1000000
X-RateLimit-Remaining-TPM: 995000
X-RateLimit-Reset-TPM: 1706745660
| Header | Description |
|---|
X-RateLimit-Limit-* | Your current limit |
X-RateLimit-Remaining-* | Remaining quota |
X-RateLimit-Reset-* | Unix timestamp when quota resets |
Usage Alerts
Set up alerts in your dashboard to get notified when:
- You reach 80% of your monthly tokens
- You exceed rate limits frequently
- Unusual usage patterns are detected
Estimating Future Usage
Before Sending Requests
import tiktoken
def estimate_tokens(text):
encoding = tiktoken.get_encoding("cl100k_base")
return len(encoding.encode(text))
def estimate_request_cost(messages, max_output_tokens=500, price_per_million=0.10):
input_tokens = sum(estimate_tokens(m["content"]) for m in messages)
input_tokens += len(messages) * 4 # Message overhead
# Estimate total (input + expected output)
estimated_total = input_tokens + max_output_tokens
cost = estimated_total * price_per_million / 1_000_000
return {
"estimated_input_tokens": input_tokens,
"max_output_tokens": max_output_tokens,
"estimated_cost": f"${cost:.6f}"
}
# Example
messages = [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Explain quantum computing."}
]
print(estimate_request_cost(messages))
Usage by Model
Track usage separately by model for optimization:
from collections import defaultdict
usage_by_model = defaultdict(lambda: {"tokens": 0, "cost": 0, "requests": 0})
MODEL_PRICES = {
"llama-3.1-8b": 0.10,
"llama-3.1-70b": 0.90,
"e5-large-v2": 0.01,
}
def track_by_model(model, usage):
price = MODEL_PRICES.get(model, 0.10)
usage_by_model[model]["tokens"] += usage.total_tokens
usage_by_model[model]["cost"] += usage.total_tokens * price / 1_000_000
usage_by_model[model]["requests"] += 1
# After each request
track_by_model("llama-3.1-8b", response.usage)
# Report
for model, data in usage_by_model.items():
print(f"{model}: {data['tokens']} tokens, ${data['cost']:.4f}")
Optimizing Usage
Choose Efficient Models
Use smaller models for simple tasks (phi-3-mini vs llama-3.1-70b)
Limit Output Tokens
Set max_tokens to prevent runaway responses
Trim Context
Remove old messages from conversation history
Cache Responses
Cache repeated queries to avoid re-computation
Monthly Limits
Your subscription includes a monthly token allocation:
| Tier | Monthly Tokens | Overage Rate |
|---|
| Free | 100K | No overage allowed |
| Developer | 5M | Model prices |
| Startup | 25M | Model prices |
| Enterprise | Unlimited | N/A |
Free tier users cannot exceed their monthly limit. Upgrade to enable pay-as-you-go overage.
API for Usage Data
Check your usage programmatically (coming soon):
curl https://api.assisters.dev/v1/usage \
-H "Authorization: Bearer ask_your_api_key"
Dashboard Features
Usage Charts
Visual breakdown of token consumption over time
Export Data
Export usage data as CSV for accounting
Team Usage
View usage by team member (Enterprise)
Budget Alerts
Get notified before hitting limits
View Your Usage
Check your current usage and remaining tokens