Content Moderation

Automatically detect and filter harmful, inappropriate, or policy-violating content. Use this endpoint to protect your users and maintain community standards.

Endpoint

POST https://api.assisters.dev/v1/moderate

Request Body

model

string

default:"llama-guard-3"

The moderation model to use. See available models.Options: llama-guard-3, shieldgemma

input

string | array

required

The text to moderate. Can be a single string or an array of up to 100 strings.

Request Examples

Single Text

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.moderations.create(
    model="llama-guard-3",
    input="Hello, how are you today?"
)

result = response.results[0]
print(f"Flagged: {result.flagged}")
print(f"Categories: {result.categories}")

Batch Moderation

response = client.moderations.create(
    model="llama-guard-3",
    input=[
        "First message to check",
        "Second message to check",
        "Third message to check"
    ]
)

for i, result in enumerate(response.results):
    print(f"Message {i}: Flagged={result.flagged}")

Pre-moderation Pattern

def moderate_before_response(user_message):
    """Check user input before processing"""
    moderation = client.moderations.create(
        model="llama-guard-3",
        input=user_message
    )

    if moderation.results[0].flagged:
        return {
            "error": "Your message violates our content policy",
            "categories": moderation.results[0].categories
        }

    # Process the message normally
    response = client.chat.completions.create(
        model="llama-3.1-8b",
        messages=[{"role": "user", "content": user_message}]
    )

    return {"response": response.choices[0].message.content}

Response

{
  "id": "modr-abc123xyz",
  "model": "llama-guard-3",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "hate/threatening": false,
        "harassment": false,
        "harassment/threatening": false,
        "self-harm": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "sexual": false,
        "sexual/minors": false,
        "violence": false,
        "violence/graphic": false
      },
      "category_scores": {
        "hate": 0.00012,
        "hate/threatening": 0.00001,
        "harassment": 0.00034,
        "harassment/threatening": 0.00002,
        "self-harm": 0.00001,
        "self-harm/intent": 0.00001,
        "self-harm/instructions": 0.00001,
        "sexual": 0.00015,
        "sexual/minors": 0.00001,
        "violence": 0.00023,
        "violence/graphic": 0.00002
      }
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Response Fields

string

Unique identifier for the moderation request

model

string

The model used for moderation

results

array

Array of moderation results, one per input:

flagged: Boolean indicating if content violates policy
categories: Object with boolean for each category
category_scores: Object with confidence scores (0-1) for each category

usage

object

Token usage for billing

Category	Description
`hate`	Content expressing hatred toward a group
`hate/threatening`	Hateful content with threats of violence
`harassment`	Content meant to harass or bully
`harassment/threatening`	Harassment with threats
`self-harm`	Content promoting self-harm
`self-harm/intent`	Expression of self-harm intent
`self-harm/instructions`	Instructions for self-harm
`sexual`	Sexually explicit content
`sexual/minors`	Sexual content involving minors
`violence`	Content depicting violence
`violence/graphic`	Graphic depictions of violence

Available Models

Model	Description	Price
`llama-guard-3`	Meta’s latest safety model, best accuracy	$0.20/M tokens
`shieldgemma`	Google’s efficient safety model	$0.15/M tokens

Compare Moderation Models

See detailed model specifications and benchmarks

Use Cases

User Input Validation

Check all user-submitted content before processing or displaying:

@app.post("/submit")
async def submit_content(content: str):
    moderation = await moderate(content)
    if moderation.flagged:
        raise HTTPException(400, "Content violates policy")
    return process_content(content)

AI Output Safety

Verify AI-generated responses before showing to users:

response = generate_ai_response(prompt)
moderation = moderate(response)

if moderation.flagged:
    return "I cannot provide that response."

return response

Comment Filtering

Automatically filter inappropriate comments:

def process_comment(comment):
    result = moderate(comment)

    if result.category_scores["hate"] > 0.5:
        return {"status": "rejected", "reason": "hate_speech"}

    if result.flagged:
        return {"status": "pending_review"}

    return {"status": "approved"}

Custom Thresholds

Use category scores for fine-grained control:

result = moderate(text).results[0]

# Strict threshold for violence
if result.category_scores["violence"] > 0.3:
    flag_for_review(text)

# Lenient threshold for mild language
if result.category_scores["harassment"] > 0.7:
    reject(text)

Best Practices

Moderate Both Inputs and Outputs

Check user messages AND AI responses for safety

Use Custom Thresholds

Adjust category_scores thresholds based on your use case

Batch for Efficiency

Send multiple texts in one request when possible

Cache Results

Cache moderation results for repeated content

Error Responses

400 Bad Request - Too Many Inputs

{
  "error": {
    "message": "Too many inputs. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_inputs"
  }
}

400 Bad Request - Empty Input

{
  "error": {
    "message": "Input cannot be empty",
    "type": "invalid_request_error",
    "code": "empty_input"
  }
}

Overview

Endpoints

Moderation

Content Moderation

Endpoint

Request Body

Request Examples

Single Text

Batch Moderation

Pre-moderation Pattern

Response

Response Fields

Categories

Available Models

Compare Moderation Models

Use Cases

Best Practices

Moderate Both Inputs and Outputs

Use Custom Thresholds

Batch for Efficiency

Cache Results

Error Responses

Overview

Endpoints

​Content Moderation

​Endpoint

​Request Body

​Request Examples

​Single Text

​Batch Moderation

​Pre-moderation Pattern

​Response

​Response Fields

​Categories

​Available Models

Compare Moderation Models

​Use Cases

​Best Practices

Moderate Both Inputs and Outputs

Use Custom Thresholds

Batch for Efficiency

Cache Results

​Error Responses

Content Moderation

Endpoint

Request Body

Request Examples

Single Text

Batch Moderation

Pre-moderation Pattern

Response

Response Fields

Categories

Available Models

Use Cases

Best Practices

Error Responses