Skip to main content

Model Overview

Assisters API provides access to 18+ open-source AI models across four categories. All models are served through our unified, OpenAI-compatible API.
Free Models Available: We offer completely free models powered by Groq and HuggingFace:
  • Chat: Llama 3.3 70B (FREE via Groq)
  • Embeddings: BGE-M3 (FREE via HuggingFace)
  • Moderation: Llama Guard 3 8B (FREE via Groq)

Model Categories

Quick Comparison

Chat Models

ModelProviderContextSpeedPrice
llama-3.1-8bMeta128KFast$0.10/M
llama-3.1-70bMeta128KMedium$0.90/M
mistral-7bMistral AI32KFast$0.10/M
qwen2-7bAlibaba32KFast$0.10/M
gemma-2-9bGoogle8KFast$0.15/M
phi-3-miniMicrosoft4KFastest$0.08/M

Embedding Models

ModelProviderDimensionsMax TokensPrice
bge-m3BAAI (HuggingFace)10248192FREE
e5-large-v2Microsoft1024512$0.01/M
bge-base-enBAAI768512$0.01/M
jina-embeddings-v2Jina AI7688192$0.02/M
nomic-embed-textNomic AI7688192$0.01/M
gte-largeAlibaba1024512$0.01/M

Safety Models

ModelTypeProviderPrice
llama-guard-3ModerationMeta$0.20/M
shieldgemmaModerationGoogle$0.15/M
bge-reranker-v2RerankingBAAI$0.05/M
jina-rerankerRerankingJina AI$0.08/M

Choosing the Right Model

For Chat Applications

Llama 3.1 70B - Best quality for complex tasks, reasoning, and creative writing. Higher latency and cost, but excellent results.
model="llama-3.1-70b"
Llama 3.1 8B - Great balance of quality and cost. Fast responses, good for most use cases.
model="llama-3.1-8b"
Phi-3 Mini - Lowest latency and cost. Best for simple tasks and high-volume applications.
model="phi-3-mini"
Llama 3.1 8B/70B - 128K context window for processing long documents.
model="llama-3.1-8b"  # 128K context

For Embeddings

BGE-M3 - BAAI’s multilingual embedding model. Best quality, long context (8192 tokens), 100+ languages. Completely free via HuggingFace.
model="bge-m3"
E5-large-v2 - Microsoft’s flagship embedding model. Best for accuracy-critical applications.
model="e5-large-v2"
Jina Embeddings v2 - 8192 token context for embedding entire documents.
model="jina-embeddings-v2"

For Moderation

Llama Guard 3 - Meta’s latest safety model with the best accuracy.
model="llama-guard-3"
ShieldGemma - Google’s efficient safety model, 25% cheaper.
model="shieldgemma"

Model Selection Guide

Pricing Tiers

All models are billed per million tokens. Free tier models available!
CategoryPrice RangeFree Option
ChatFREE - $0.90/Mllama-3.3-70b via Groq
EmbeddingsFREE - $0.02/Mbge-m3 via HuggingFace
ModerationFREE - $0.20/Mllama-guard-3-8b via Groq
Reranking0.050.05 - 0.08/M-

View Pricing Details

See full pricing breakdown and subscription tiers

Model Updates

We continuously update our model offerings:
  • New models are added as they become available
  • Deprecated models are announced 3+ months in advance
  • Performance improvements are applied without version changes

View Changelog

Stay updated on model changes and additions