Model Overview

Assisters API provides access to 18+ open-source AI models across four categories. All models are served through our unified, OpenAI-compatible API.

Free Models Available: We offer completely free models powered by Groq and HuggingFace:

Chat: Llama 3.3 70B (FREE via Groq)
Embeddings: BGE-M3 (FREE via HuggingFace)
Moderation: Llama Guard 3 8B (FREE via Groq)

Model Categories

Chat Models

Generate conversational responses with models like Llama, Mistral, and Qwen

Embedding Models

Create text embeddings for semantic search and similarity

Moderation Models

Detect harmful or inappropriate content

Reranking Models

Improve search quality by reranking results

Quick Comparison

Chat Models

Model	Provider	Context	Speed	Price
`llama-3.1-8b`	Meta	128K	Fast	$0.10/M
`llama-3.1-70b`	Meta	128K	Medium	$0.90/M
`mistral-7b`	Mistral AI	32K	Fast	$0.10/M
`qwen2-7b`	Alibaba	32K	Fast	$0.10/M
`gemma-2-9b`	Google	8K	Fast	$0.15/M
`phi-3-mini`	Microsoft	4K	Fastest	$0.08/M

Embedding Models

Model	Provider	Dimensions	Max Tokens	Price
`bge-m3`	BAAI (HuggingFace)	1024	8192	FREE
`e5-large-v2`	Microsoft	1024	512	$0.01/M
`bge-base-en`	BAAI	768	512	$0.01/M
`jina-embeddings-v2`	Jina AI	768	8192	$0.02/M
`nomic-embed-text`	Nomic AI	768	8192	$0.01/M
`gte-large`	Alibaba	1024	512	$0.01/M

Safety Models

Model	Type	Provider	Price
`llama-guard-3`	Moderation	Meta	$0.20/M
`shieldgemma`	Moderation	Google	$0.15/M
`bge-reranker-v2`	Reranking	BAAI	$0.05/M
`jina-reranker`	Reranking	Jina AI	$0.08/M

Choosing the Right Model

For Chat Applications

Best Overall

Llama 3.1 70B - Best quality for complex tasks, reasoning, and creative writing. Higher latency and cost, but excellent results.

model="llama-3.1-70b"

Best Value

Llama 3.1 8B - Great balance of quality and cost. Fast responses, good for most use cases.

model="llama-3.1-8b"

Fastest

Phi-3 Mini - Lowest latency and cost. Best for simple tasks and high-volume applications.

model="phi-3-mini"

Long Context

Llama 3.1 8B/70B - 128K context window for processing long documents.

model="llama-3.1-8b"  # 128K context

For Embeddings

FREE & Best Quality

BGE-M3 - BAAI’s multilingual embedding model. Best quality, long context (8192 tokens), 100+ languages. Completely free via HuggingFace.

model="bge-m3"

Alternative: E5-large-v2

E5-large-v2 - Microsoft’s flagship embedding model. Best for accuracy-critical applications.

model="e5-large-v2"

Long Documents

Jina Embeddings v2 - 8192 token context for embedding entire documents.

model="jina-embeddings-v2"

For Moderation

Most Accurate

Llama Guard 3 - Meta’s latest safety model with the best accuracy.

model="llama-guard-3"

Most Efficient

ShieldGemma - Google’s efficient safety model, 25% cheaper.

model="shieldgemma"

Model Selection Guide

Pricing Tiers

All models are billed per million tokens. Free tier models available!

Category	Price Range	Free Option
Chat	FREE - $0.90/M	`llama-3.3-70b` via Groq
Embeddings	FREE - $0.02/M	`bge-m3` via HuggingFace
Moderation	FREE - $0.20/M	`llama-guard-3-8b` via Groq
Reranking	$0.05 -$ 0.08/M	-

View Pricing Details

See full pricing breakdown and subscription tiers

Model Updates

We continuously update our model offerings:

New models are added as they become available
Deprecated models are announced 3+ months in advance
Performance improvements are applied without version changes

View Changelog

Stay updated on model changes and additions

Model Catalog

​Model Overview

​Model Categories

Chat Models

Embedding Models

Moderation Models

Reranking Models

​Quick Comparison

​Chat Models

​Embedding Models

​Safety Models

​Choosing the Right Model

​For Chat Applications

​For Embeddings

​For Moderation

​Model Selection Guide

​Pricing Tiers

View Pricing Details

​Model Updates

View Changelog

Model Overview

Model Categories

Quick Comparison

Chat Models

Embedding Models

Safety Models

Choosing the Right Model

For Chat Applications

For Embeddings

For Moderation

Model Selection Guide

Pricing Tiers

Model Updates