Rate Limits
Rate limits protect the API from abuse and ensure fair access for all users. Learn how limits work and how to handle them gracefully.Rate Limit Types
Assisters API enforces two types of rate limits:| Type | Description |
|---|---|
| RPM | Requests Per Minute - total API calls |
| TPM | Tokens Per Minute - total tokens processed |
Limits by Tier
| Tier | RPM | TPM | Monthly Tokens |
|---|---|---|---|
| Free | 10 | 100,000 | 100,000 |
| Developer | 100 | 1,000,000 | 5,000,000 |
| Startup | 500 | 5,000,000 | 25,000,000 |
| Enterprise | Custom | Custom | Unlimited |
Upgrade Your Plan
Need higher limits? Upgrade to a higher tier
Rate Limit Headers
Every response includes rate limit information:| Header | Description |
|---|---|
X-RateLimit-Limit-* | Your current limit |
X-RateLimit-Remaining-* | Remaining quota |
X-RateLimit-Reset-* | Unix timestamp when quota resets |
Rate Limit Errors
When you exceed limits, you’ll receive a429 Too Many Requests response:
Retry-After header:
Handling Rate Limits
Basic Retry Logic
Exponential Backoff
JavaScript Implementation
Best Practices
Implement Retries
Always implement retry logic with exponential backoff
Monitor Headers
Check rate limit headers to proactively slow down
Queue Requests
Use a request queue to control throughput
Cache Responses
Cache responses when possible to reduce API calls
Request Queuing
For high-volume applications, implement a request queue:Token Management
TPM limits are based on total tokens (input + output). Manage them by:1. Estimate Tokens Before Sending
2. Set Max Tokens
3. Trim Context
Monitoring Usage
Track your usage proactively:Burst Handling
For batch processing, spread requests over time:Need Higher Limits?
Contact Sales
Enterprise plans offer custom rate limits tailored to your needs