Skip to main content

Pricing

Simple, transparent pricing with no hidden fees. Pay only for what you use.
Free Models Available! Get started with zero cost using our free tier models:
  • Chat: Llama 3.3 70B via Groq (FREE)
  • Embeddings: BGE-M3 via HuggingFace (FREE)
  • Moderation: Llama Guard 3 via Groq (FREE)

Subscription Tiers

Free

$0/month
  • 100K tokens/month
  • 10 requests/minute
  • Community support
  • All models available
Perfect for testing and small projects.

Developer

**29/month(or29/month** (or 24/mo annually)
  • 5M tokens/month
  • 100 requests/minute
  • Email support (48h)
  • Priority processing
Ideal for indie developers and startups.

Startup

**99/month(or99/month** (or 84/mo annually)
  • 25M tokens/month
  • 500 requests/minute
  • Priority email support (24h)
  • Advanced analytics
For growing teams and products.

Enterprise

Custom pricing
  • Unlimited tokens
  • Custom rate limits
  • Dedicated support (4h)
  • SLA guarantees
  • Custom models
Contact sales for volume pricing.

Model Pricing

All models are billed per million tokens (input + output combined):

Chat Models

ModelPrice per Million Tokens
llama-3.3-70b (via Groq)FREE
llama-3.1-8b$0.10
llama-3.1-70b$0.90
mistral-7b$0.10
qwen2-7b$0.10
gemma-2-9b$0.15
phi-3-mini$0.08

Embedding Models

ModelPrice per Million Tokens
bge-m3FREE
e5-large-v2$0.01
bge-base-en$0.01
jina-embeddings-v2$0.02
nomic-embed-text$0.01
gte-large$0.01

Safety Models

ModelPrice per Million Tokens
llama-guard-3-8b (via Groq)FREE
llama-guard-3$0.20
shieldgemma$0.15
bge-reranker-v2$0.05
jina-reranker$0.08

How Billing Works

Token-Based Billing

You’re billed for total tokens (input + output):
Cost = total_tokens × price_per_million / 1,000,000
Example:
  • Input: 500 tokens
  • Output: 1,000 tokens
  • Total: 1,500 tokens
  • Model: llama-3.1-8b ($0.10/M)
  • Cost: 1,500 × 0.10/1,000,000=0.10 / 1,000,000 = **0.00015**

Subscription vs. Pay-as-you-go

FeatureSubscriptionPay-as-you-go
Monthly tokensIncludedFrom wallet
OverageCharged to walletCharged to wallet
Rate limitsBy tierBy tier
RolloverNoN/A

Overage Pricing

If you exceed your tier’s monthly tokens, additional usage is charged to your wallet at model prices.

Rate Limits

TierRequests/MinuteTokens/Minute
Free10100,000
Developer1001,000,000
Startup5005,000,000
EnterpriseCustomCustom

Cost Calculator

Estimate your monthly costs:
def estimate_monthly_cost(
    requests_per_day: int,
    avg_tokens_per_request: int,
    price_per_million: float
) -> float:
    daily_tokens = requests_per_day * avg_tokens_per_request
    monthly_tokens = daily_tokens * 30
    cost = monthly_tokens * price_per_million / 1_000_000
    return cost

# Example: 1000 requests/day, 500 tokens each, $0.10/M
cost = estimate_monthly_cost(1000, 500, 0.10)
print(f"Monthly cost: ${cost:.2f}")  # $1.50

Comparison with OpenAI

Use CaseOpenAIAssistersSavings
1M chat tokens (GPT-4)~$30FREE (Llama 3.3 70B)100%
1M chat tokens (GPT-3.5)~$2$0.1095%
1M embeddings~$0.13FREE (BGE-M3)100%
1M moderation~$0.002FREE (Llama Guard 3)100%

FAQ

Tokens are pieces of words. Roughly:
  • 1 token ≈ 4 characters in English
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
Both input and output tokens are counted.
No, subscription tokens reset monthly on your billing date. Consider upgrading if you consistently exceed your limit.
Yes, you can downgrade at any time. The change takes effect at the next billing cycle.
Annual plans are billed once per year and include a 17% discount. Tokens still reset monthly.
We accept all major credit cards through Stripe. Enterprise customers can pay via invoice.

Get Started