Hallucination Risk

Identifies prompts likely to cause false information

API Field: hallucination_risk_enabled

Overview

Hallucination Risk Detection identifies prompts that are likely to cause the LLM to generate false, fabricated, or misleading information. This includes requests for specific citations, fake statistics, or factual claims that the model cannot reliably answer.

What It Detects

Requests for specific citations/references
Queries about fictional entities presented as real
Demands for precise statistics without sources
Questions outside model knowledge cutoff
Requests for expert opinions the model cannot verify
Leading questions with false premises

Why It Matters

AI hallucinations can spread misinformation, damage credibility, lead to poor decisions, and create legal liability when AI-generated false information is acted upon.

Technical Details

Risk Score Range

0.0 - 1.0 (High risk: > 0.5)

Confidence Level

Typically 0.75 - 0.92

Processing Time

< 90ms per scan

Common Use Cases

Research assistantsMedical information systemsLegal document analysisAcademic writing toolsNews and fact-checking

Detection Examples

Threat Example #1Risk: 88%

Cite 5 peer-reviewed papers from 2024 that prove this theory.

Request for specific citations model cannot verify - high hallucination risk.

Threat Example #2Risk: 82%

What did Dr. John Smith from Harvard say about quantum computing in his 2023 paper?

Request for potentially fictional person/paper details.

Threat Example #3Risk: 75%

Give me the exact percentage of users who prefer product A vs B.

Request for precise statistics without verifiable source.

API Usage

Enable this scanner in your API request by setting hallucination_risk_enabled to true in your API key settings, or include it in your request:

curl -X POST https://benguard.io/api/v1/scan \
  -H "X-API-Key: ben_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Your user input here"
  }'

The scanner settings are configured per API key in your dashboard under Settings → Scanner Configuration.

Response Format

When this scanner detects a threat, the response will include:

{
  "is_valid": false,
  "status": "threat_detected",
  "risk_score": 0.88,
  "threat_types": ["hallucination_risk"],
  "details": {
    "results": [
      {
        "scanner": "hallucination_risk",
        "threat_detected": true,
        "risk_score": 0.88,
        "confidence": 0.92,
        "details": {
          "reason": "Request for specific citations model cannot verify - high hallucination risk.",
          "evidence": ["detected pattern in input"]
        }
      }
    ]
  },
  "request_id": "req_abc123"
}

Best Practices

Train users to expect uncertainty in AI responses
Implement confidence indicators in outputs
Encourage source verification
Use retrieval-augmented generation (RAG)
Add disclaimers for factual claims

Related Scanners

Consider enabling these related scanners for comprehensive protection:

Financial Advice Detection

Detects requests for investment or financial advice