Skip to main content
POST
https://api.assisters.dev
/
v1
/
moderate
Moderation
curl --request POST \
  --url https://api.assisters.dev/v1/moderate \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": {}
}
'
{
  "id": "<string>",
  "model": "<string>",
  "results": [
    {}
  ],
  "usage": {}
}

Content Moderation

Automatically detect and filter harmful, inappropriate, or policy-violating content. Use this endpoint to protect your users and maintain community standards.

Endpoint

POST https://api.assisters.dev/v1/moderate

Request Body

model
string
default:"llama-guard-3"
The moderation model to use. See available models.Options: llama-guard-3, shieldgemma
input
string | array
required
The text to moderate. Can be a single string or an array of up to 100 strings.

Request Examples

Single Text

from openai import OpenAI

client = OpenAI(
    api_key="ask_your_api_key",
    base_url="https://api.assisters.dev/v1"
)

response = client.moderations.create(
    model="llama-guard-3",
    input="Hello, how are you today?"
)

result = response.results[0]
print(f"Flagged: {result.flagged}")
print(f"Categories: {result.categories}")

Batch Moderation

response = client.moderations.create(
    model="llama-guard-3",
    input=[
        "First message to check",
        "Second message to check",
        "Third message to check"
    ]
)

for i, result in enumerate(response.results):
    print(f"Message {i}: Flagged={result.flagged}")

Pre-moderation Pattern

def moderate_before_response(user_message):
    """Check user input before processing"""
    moderation = client.moderations.create(
        model="llama-guard-3",
        input=user_message
    )

    if moderation.results[0].flagged:
        return {
            "error": "Your message violates our content policy",
            "categories": moderation.results[0].categories
        }

    # Process the message normally
    response = client.chat.completions.create(
        model="llama-3.1-8b",
        messages=[{"role": "user", "content": user_message}]
    )

    return {"response": response.choices[0].message.content}

Response

{
  "id": "modr-abc123xyz",
  "model": "llama-guard-3",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "hate/threatening": false,
        "harassment": false,
        "harassment/threatening": false,
        "self-harm": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "sexual": false,
        "sexual/minors": false,
        "violence": false,
        "violence/graphic": false
      },
      "category_scores": {
        "hate": 0.00012,
        "hate/threatening": 0.00001,
        "harassment": 0.00034,
        "harassment/threatening": 0.00002,
        "self-harm": 0.00001,
        "self-harm/intent": 0.00001,
        "self-harm/instructions": 0.00001,
        "sexual": 0.00015,
        "sexual/minors": 0.00001,
        "violence": 0.00023,
        "violence/graphic": 0.00002
      }
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Response Fields

id
string
Unique identifier for the moderation request
model
string
The model used for moderation
results
array
Array of moderation results, one per input:
  • flagged: Boolean indicating if content violates policy
  • categories: Object with boolean for each category
  • category_scores: Object with confidence scores (0-1) for each category
usage
object
Token usage for billing

Categories

CategoryDescription
hateContent expressing hatred toward a group
hate/threateningHateful content with threats of violence
harassmentContent meant to harass or bully
harassment/threateningHarassment with threats
self-harmContent promoting self-harm
self-harm/intentExpression of self-harm intent
self-harm/instructionsInstructions for self-harm
sexualSexually explicit content
sexual/minorsSexual content involving minors
violenceContent depicting violence
violence/graphicGraphic depictions of violence

Available Models

ModelDescriptionPrice
llama-guard-3Meta’s latest safety model, best accuracy$0.20/M tokens
shieldgemmaGoogle’s efficient safety model$0.15/M tokens

Compare Moderation Models

See detailed model specifications and benchmarks

Use Cases

Check all user-submitted content before processing or displaying:
@app.post("/submit")
async def submit_content(content: str):
    moderation = await moderate(content)
    if moderation.flagged:
        raise HTTPException(400, "Content violates policy")
    return process_content(content)
Verify AI-generated responses before showing to users:
response = generate_ai_response(prompt)
moderation = moderate(response)

if moderation.flagged:
    return "I cannot provide that response."

return response
Automatically filter inappropriate comments:
def process_comment(comment):
    result = moderate(comment)

    if result.category_scores["hate"] > 0.5:
        return {"status": "rejected", "reason": "hate_speech"}

    if result.flagged:
        return {"status": "pending_review"}

    return {"status": "approved"}
Use category scores for fine-grained control:
result = moderate(text).results[0]

# Strict threshold for violence
if result.category_scores["violence"] > 0.3:
    flag_for_review(text)

# Lenient threshold for mild language
if result.category_scores["harassment"] > 0.7:
    reject(text)

Best Practices

Moderate Both Inputs and Outputs

Check user messages AND AI responses for safety

Use Custom Thresholds

Adjust category_scores thresholds based on your use case

Batch for Efficiency

Send multiple texts in one request when possible

Cache Results

Cache moderation results for repeated content

Error Responses

{
  "error": {
    "message": "Too many inputs. Maximum is 100.",
    "type": "invalid_request_error",
    "code": "too_many_inputs"
  }
}
{
  "error": {
    "message": "Input cannot be empty",
    "type": "invalid_request_error",
    "code": "empty_input"
  }
}