Content Moderation
Protect your users and platform by implementing content moderation for both user inputs and AI-generated outputs.Why Moderate?
Protect Users
Shield users from harmful, offensive, or inappropriate content
Platform Safety
Maintain community standards and brand reputation
Legal Compliance
Meet regulatory requirements (GDPR, DSA, etc.)
Reduce Abuse
Prevent misuse of your AI-powered features
Moderation Endpoint
Use the/v1/moderate endpoint:
Moderation Categories
| Category | Description |
|---|---|
hate | Content expressing hatred toward protected groups |
hate/threatening | Hateful content with violence threats |
harassment | Content meant to harass or bully |
harassment/threatening | Harassment with explicit threats |
self-harm | Content promoting self-harm |
self-harm/intent | Expression of self-harm intent |
self-harm/instructions | Instructions for self-harm |
sexual | Sexually explicit content |
sexual/minors | Sexual content involving minors |
violence | Content depicting violence |
violence/graphic | Graphic depictions of violence |
Implementation Patterns
1. Moderate User Inputs (Pre-moderation)
Check all user messages before processing:2. Moderate AI Outputs (Post-moderation)
Verify AI responses before showing to users:3. Bi-directional Moderation
Check both inputs AND outputs:4. Custom Thresholds
Usecategory_scores for fine-grained control:
5. Batch Moderation
Moderate multiple items efficiently:Handling Violations
1. Block and Notify
2. Review Queue
3. User Warnings
Best Practices
Moderate Both Directions
Check user inputs AND AI outputs
Use Custom Thresholds
Tune sensitivity based on your use case
Log Everything
Keep audit trails for compliance
Human Review
Have humans review borderline cases
Complete Example
Pricing
| Model | Price per Million Tokens |
|---|---|
llama-guard-3 | $0.20 |
shieldgemma | $0.15 |
View Moderation Models
Compare moderation model capabilities