Moderation Models

Protect your users and platform with AI-powered content moderation. These models detect harmful, inappropriate, or policy-violating content.

Available Models

Llama Guard 3 8B (FREE - Recommended)

Meta’s efficient safety model for content moderation. FREE via Groq.

Specification	Value
Provider	Meta (via Groq)
Base Model	Llama 3 8B
Categories	11 safety categories
Price	FREE
Latency	~100ms

Best for:

All content moderation use cases
Production safety systems
Real-time filtering
Cost-free deployment

response = client.moderations.create(
    model="llama-guard-3-8b",
    input="Content to moderate"
)

if response.results[0].flagged:
    print("Content violates policy")

Free Model: This is our recommended model for content moderation. It offers excellent accuracy at zero cost via Groq’s free inference tier.

Llama Guard 3

Meta’s latest safety model built on Llama 3, offering the best accuracy for content moderation.

Specification	Value
Provider	Meta
Base Model	Llama 3
Categories	11 safety categories
Price	$0.20 / million tokens
Latency	~150ms

Best for:

High-accuracy requirements
Comprehensive category detection
Production safety systems
Regulatory compliance

response = client.moderations.create(
    model="llama-guard-3",
    input="Content to moderate"
)

if response.results[0].flagged:
    print("Content violates policy")

Detected Categories:

Hate speech and discrimination
Harassment and bullying
Violence and threats
Self-harm content
Sexual content
Illegal activities
Personal information exposure

ShieldGemma

Google’s efficient safety model optimized for speed and cost.

Specification	Value
Provider	Google
Base Model	Gemma
Categories	8 safety categories
Price	$0.15 / million tokens
Latency	~100ms

Best for:

Cost-sensitive applications
High-volume moderation
Real-time filtering
Basic safety requirements

response = client.moderations.create(
    model="shieldgemma",
    input="Content to moderate"
)

# Check category scores
scores = response.results[0].category_scores
if scores["violence"] > 0.5:
    flag_for_review()

Model Comparison

Feature	Llama Guard 3 8B (FREE)	Llama Guard 3	ShieldGemma
Accuracy	★★★★★	★★★★★	★★★★☆
Speed	★★★★★	★★★★☆	★★★★★
Price	FREE	$0.20/M	$0.15/M
Categories	11	11	8
Best For	All use cases	High-stakes	High-volume

Safety Categories

Both models detect these core categories:

Category	Description
`hate`	Content expressing hatred toward protected groups
`hate/threatening`	Hateful content with threats of violence
`harassment`	Content meant to harass, bully, or intimidate
`harassment/threatening`	Harassment with explicit threats
`self-harm`	Content promoting or glorifying self-harm
`self-harm/intent`	Expression of intent to self-harm
`self-harm/instructions`	Instructions for self-harm
`sexual`	Sexually explicit content
`sexual/minors`	Sexual content involving minors
`violence`	Content depicting violence
`violence/graphic`	Graphic depictions of violence

Response Format

{
  "id": "modr-abc123",
  "model": "llama-guard-3",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual": false,
        "violence": false
      },
      "category_scores": {
        "hate": 0.0001,
        "harassment": 0.0023,
        "self-harm": 0.0001,
        "sexual": 0.0012,
        "violence": 0.0008
      }
    }
  ]
}

Use Cases

User Input Validation

Check user messages before processing:

def validate_input(message):
    result = client.moderations.create(
        model="llama-guard-3",
        input=message
    ).results[0]

    if result.flagged:
        raise ContentPolicyError(
            "Your message violates our content policy"
        )

    return message

AI Output Safety

Verify AI responses before showing to users:

def safe_generate(prompt):
    # Generate response
    response = client.chat.completions.create(
        model="llama-3.1-8b",
        messages=[{"role": "user", "content": prompt}]
    )
    content = response.choices[0].message.content

    # Moderate output
    moderation = client.moderations.create(
        model="llama-guard-3",
        input=content
    ).results[0]

    if moderation.flagged:
        return "I cannot provide that response."

    return content

Custom Thresholds

Use category scores for fine-grained control:

def custom_moderation(text, thresholds):
    result = client.moderations.create(
        model="llama-guard-3",
        input=text
    ).results[0]

    violations = []
    for category, threshold in thresholds.items():
        score = result.category_scores.get(category, 0)
        if score > threshold:
            violations.append(category)

    return violations

# Strict for violence, lenient for mild language
thresholds = {
    "violence": 0.3,
    "harassment": 0.7,
    "hate": 0.5
}

Batch Moderation

Moderate multiple items efficiently:

comments = ["comment 1", "comment 2", "comment 3"]

result = client.moderations.create(
    model="shieldgemma",  # Faster for batches
    input=comments
)

for i, r in enumerate(result.results):
    if r.flagged:
        print(f"Comment {i} flagged: {comments[i]}")

Best Practices

Moderate Both Directions

Check both user inputs AND AI outputs for comprehensive safety

Use Custom Thresholds

Adjust category_scores based on your platform’s needs

Log for Review

Keep logs of flagged content for human review and model improvement

Graceful Degradation

Have fallback behavior when moderation service is unavailable

Performance Considerations

Scenario	Recommended Model
All use cases (FREE)	Llama Guard 3 8B
Real-time chat	Llama Guard 3 8B (FREE)
User-generated content	Llama Guard 3 8B (FREE)
High-volume batches	Llama Guard 3 8B (FREE)
Regulatory compliance	Llama Guard 3

Choosing a Model

Recommendation: Use llama-guard-3-8b for all content moderation. It’s free via Groq, fast, and offers the same 11 safety categories as the paid version.

Model Catalog

​Moderation Models

​Available Models

​Llama Guard 3 8B (FREE - Recommended)

​Llama Guard 3

​ShieldGemma

​Model Comparison

​Safety Categories

​Response Format

​Use Cases

​Best Practices

Moderate Both Directions

Use Custom Thresholds

Log for Review

Graceful Degradation

​Performance Considerations

​Choosing a Model

Moderation Models

Available Models

Llama Guard 3 8B (FREE - Recommended)

Llama Guard 3

ShieldGemma

Model Comparison

Safety Categories

Response Format

Use Cases

Best Practices

Performance Considerations

Choosing a Model