Moderation Models
Protect your users and platform with AI-powered content moderation. These models detect harmful, inappropriate, or policy-violating content.Available Models
Llama Guard 3 8B (FREE - Recommended)
Meta’s efficient safety model for content moderation. FREE via Groq.| Specification | Value |
|---|---|
| Provider | Meta (via Groq) |
| Base Model | Llama 3 8B |
| Categories | 11 safety categories |
| Price | FREE |
| Latency | ~100ms |
- All content moderation use cases
- Production safety systems
- Real-time filtering
- Cost-free deployment
Free Model: This is our recommended model for content moderation. It offers excellent accuracy at zero cost via Groq’s free inference tier.
Llama Guard 3
Meta’s latest safety model built on Llama 3, offering the best accuracy for content moderation.| Specification | Value |
|---|---|
| Provider | Meta |
| Base Model | Llama 3 |
| Categories | 11 safety categories |
| Price | $0.20 / million tokens |
| Latency | ~150ms |
- High-accuracy requirements
- Comprehensive category detection
- Production safety systems
- Regulatory compliance
- Hate speech and discrimination
- Harassment and bullying
- Violence and threats
- Self-harm content
- Sexual content
- Illegal activities
- Personal information exposure
ShieldGemma
Google’s efficient safety model optimized for speed and cost.| Specification | Value |
|---|---|
| Provider | |
| Base Model | Gemma |
| Categories | 8 safety categories |
| Price | $0.15 / million tokens |
| Latency | ~100ms |
- Cost-sensitive applications
- High-volume moderation
- Real-time filtering
- Basic safety requirements
Model Comparison
| Feature | Llama Guard 3 8B (FREE) | Llama Guard 3 | ShieldGemma |
|---|---|---|---|
| Accuracy | ★★★★★ | ★★★★★ | ★★★★☆ |
| Speed | ★★★★★ | ★★★★☆ | ★★★★★ |
| Price | FREE | $0.20/M | $0.15/M |
| Categories | 11 | 11 | 8 |
| Best For | All use cases | High-stakes | High-volume |
Safety Categories
Both models detect these core categories:| Category | Description |
|---|---|
hate | Content expressing hatred toward protected groups |
hate/threatening | Hateful content with threats of violence |
harassment | Content meant to harass, bully, or intimidate |
harassment/threatening | Harassment with explicit threats |
self-harm | Content promoting or glorifying self-harm |
self-harm/intent | Expression of intent to self-harm |
self-harm/instructions | Instructions for self-harm |
sexual | Sexually explicit content |
sexual/minors | Sexual content involving minors |
violence | Content depicting violence |
violence/graphic | Graphic depictions of violence |
Response Format
Use Cases
User Input Validation
User Input Validation
Check user messages before processing:
AI Output Safety
AI Output Safety
Verify AI responses before showing to users:
Custom Thresholds
Custom Thresholds
Use category scores for fine-grained control:
Batch Moderation
Batch Moderation
Moderate multiple items efficiently:
Best Practices
Moderate Both Directions
Check both user inputs AND AI outputs for comprehensive safety
Use Custom Thresholds
Adjust category_scores based on your platform’s needs
Log for Review
Keep logs of flagged content for human review and model improvement
Graceful Degradation
Have fallback behavior when moderation service is unavailable
Performance Considerations
| Scenario | Recommended Model |
|---|---|
| All use cases (FREE) | Llama Guard 3 8B |
| Real-time chat | Llama Guard 3 8B (FREE) |
| User-generated content | Llama Guard 3 8B (FREE) |
| High-volume batches | Llama Guard 3 8B (FREE) |
| Regulatory compliance | Llama Guard 3 |