Toxicity Screening
Identifies hate speech, harassment, and offensive content
API Field:toxicity_enabledOverview
Toxicity Screening detects harmful content including hate speech, harassment, threats, and other forms of abusive language. This scanner helps maintain safe and respectful interactions in AI-powered applications, protecting both users and brand reputation.
What It Detects
- Hate speech and discrimination
- Threats and intimidation
- Harassment and bullying
- Profanity and vulgar language
- Sexual harassment
- Violent content
- Personal attacks and insults
Why It Matters
Toxic content can cause psychological harm to users, create hostile environments, expose your organization to legal liability, and severely damage brand reputation.
Technical Details
Risk Score Range
0.0 - 1.0 (High risk: > 0.6)
Confidence Level
Typically 0.85 - 0.97
Processing Time
< 80ms per scan
Common Use Cases
Detection Examples
Direct hate speech targeting protected groups - immediate block required.
Implicit threat suggesting physical harm or stalking.
Personal attack and harassment language.
API Usage
Enable this scanner in your API request by setting toxicity_enabled to true in your API key settings, or include it in your request:
curl -X POST https://benguard.io/api/v1/scan \
-H "X-API-Key: ben_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Your user input here"
}'The scanner settings are configured per API key in your dashboard under Settings → Scanner Configuration.
Response Format
When this scanner detects a threat, the response will include:
{
"is_valid": false,
"status": "threat_detected",
"risk_score": 0.99,
"threat_types": ["toxicity"],
"details": {
"results": [
{
"scanner": "toxicity",
"threat_detected": true,
"risk_score": 0.99,
"confidence": 0.92,
"details": {
"reason": "Direct hate speech targeting protected groups - immediate block required.",
"evidence": ["detected pattern in input"]
}
}
]
},
"request_id": "req_abc123"
}Best Practices
- Implement zero-tolerance policies for severe toxicity
- Use graduated responses for borderline content
- Consider cultural context in detection
- Maintain human review for edge cases
- Document all moderation actions
Related Scanners
Consider enabling these related scanners for comprehensive protection: