Prompt Injection
Detects attempts to override system instructions
API Field:prompt_injection_enabledOverview
Prompt Injection is one of the most critical security threats to LLM applications. It occurs when an attacker crafts input that attempts to override, modify, or bypass the system instructions given to an LLM. This can lead to unauthorized actions, data leakage, or manipulation of the AI's behavior.
What It Detects
- Direct instruction override attempts ("Ignore all previous instructions...")
- System prompt manipulation ("System: You are now...")
- Context injection attacks
- Delimiter-based attacks using special characters
- Multi-turn manipulation attempts
- Encoded/obfuscated injection payloads
Why It Matters
Prompt injection can completely bypass your application's safety measures, potentially exposing sensitive data, allowing unauthorized actions, or causing reputational damage through inappropriate responses.
Technical Details
Risk Score Range
0.0 - 1.0 (High risk: > 0.7)
Confidence Level
Typically 0.85 - 0.98
Processing Time
< 100ms per scan
Common Use Cases
Detection Examples
Classic direct injection attempting to override system instructions and extract confidential information.
Role-based injection trying to make the model assume an unrestricted persona.
Delimiter injection using fake boundaries to inject new instructions.
API Usage
Enable this scanner in your API request by setting prompt_injection_enabled to true in your API key settings, or include it in your request:
curl -X POST https://benguard.io/api/v1/scan \
-H "X-API-Key: ben_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Your user input here"
}'The scanner settings are configured per API key in your dashboard under Settings → Scanner Configuration.
Response Format
When this scanner detects a threat, the response will include:
{
"is_valid": false,
"status": "threat_detected",
"risk_score": 0.95,
"threat_types": ["prompt_injection"],
"details": {
"results": [
{
"scanner": "prompt_injection",
"threat_detected": true,
"risk_score": 0.95,
"confidence": 0.92,
"details": {
"reason": "Classic direct injection attempting to override system instructions and extract confidential information.",
"evidence": ["detected pattern in input"]
}
}
]
},
"request_id": "req_abc123"
}Best Practices
- Always validate user input before sending to LLM
- Use strong system prompts with clear boundaries
- Implement output filtering as a secondary defense
- Monitor for unusual response patterns
- Keep your threat detection models updated
Related Scanners
Consider enabling these related scanners for comprehensive protection: