Detect harmful, inappropriate, or policy-violating content
llama-guard-3, shieldgemmaflagged: Boolean indicating if content violates policycategories: Object with boolean for each categorycategory_scores: Object with confidence scores (0-1) for each category| Category | Description |
|---|---|
hate | Content expressing hatred toward a group |
hate/threatening | Hateful content with threats of violence |
harassment | Content meant to harass or bully |
harassment/threatening | Harassment with threats |
self-harm | Content promoting self-harm |
self-harm/intent | Expression of self-harm intent |
self-harm/instructions | Instructions for self-harm |
sexual | Sexually explicit content |
sexual/minors | Sexual content involving minors |
violence | Content depicting violence |
violence/graphic | Graphic depictions of violence |
| Model | Description | Price |
|---|---|---|
llama-guard-3 | Meta’s latest safety model, best accuracy | $0.20/M tokens |
shieldgemma | Google’s efficient safety model | $0.15/M tokens |
User Input Validation
AI Output Safety
Comment Filtering
Custom Thresholds
400 Bad Request - Too Many Inputs
400 Bad Request - Empty Input