Export Formats
Hallucinator can export validation results in five formats. The TUI supports all formats via its export dialog; the CLI writes text output by default (use --output to save to a file).
Formats
| Format | Extension | Best for |
|---|---|---|
| JSON | .json | Programmatic processing, data pipelines |
| CSV | .csv | Spreadsheets, bulk analysis |
| Markdown | .md | Reports, GitHub issues, documentation |
| Text | .txt | Plain-text records, email |
| HTML | .html | Standalone visual reports |
Sorting Order
All formats use the same reference ordering within each paper:
- Retracted — Highest priority (most critical)
- Not Found — Potential hallucinations
- Author Mismatch — Title found, wrong authors
- DOI/arXiv Issues — Verified but with invalid identifiers
- FP-overridden — User-verified false positives
- Clean Verified — Confirmed references
- Skipped — References excluded from validation
Within each category, references are ordered by their original reference number.
False Positive Handling
When a reference has a false-positive override (from the TUI):
- Original status is preserved (e.g.,
not_found) - Effective status becomes
verified - FP reason is included (e.g.,
broken_parse,exists_elsewhere) - Adjusted statistics move FP-overridden references from their original bucket into
verified
JSON Schema
The JSON export produces an array of paper objects:
[
{
"filename": "paper.pdf",
"verdict": "safe",
"stats": {
"total": 42,
"verified": 38,
"not_found": 3,
"author_mismatch": 1,
"retracted": 0,
"skipped": 5,
"problematic_pct": 10.8
},
"references": [
{
"index": 0,
"original_number": 1,
"title": "Attention Is All You Need",
"raw_citation": "[1] A. Vaswani et al., ...",
"status": "verified",
"effective_status": "verified",
"fp_reason": null,
"source": "CrossRef",
"ref_authors": ["A. Vaswani", "N. Shazeer"],
"found_authors": ["Ashish Vaswani", "Noam Shazeer"],
"paper_url": "https://doi.org/10.5555/3295222.3295349",
"failed_dbs": [],
"doi_info": {
"doi": "10.5555/3295222.3295349",
"valid": true,
"title": null
},
"arxiv_info": null,
"retraction_info": null,
"db_results": [
{
"db": "CrossRef",
"status": "match",
"elapsed_ms": 234,
"authors": ["Ashish Vaswani", "Noam Shazeer"],
"url": "https://doi.org/10.5555/3295222.3295349"
},
{
"db": "arXiv",
"status": "skipped",
"elapsed_ms": 0,
"authors": [],
"url": null
}
]
}
]
}
]
Per-Reference Fields
| Field | Type | Description |
|---|---|---|
index | number | Zero-based index in the results array |
original_number | number | Original reference number from the paper (1-based) |
title | string | Extracted reference title |
raw_citation | string | Full raw citation text from PDF |
status | string | Original status: verified, not_found, author_mismatch |
effective_status | string | Status after FP overrides |
fp_reason | string? | FP reason if overridden: broken_parse, exists_elsewhere, all_timed_out, known_good, non_academic |
source | string? | Database that verified the reference |
ref_authors | string[] | Authors extracted from the PDF |
found_authors | string[] | Authors returned by the verifying database |
paper_url | string? | URL to the paper in the source database |
failed_dbs | string[] | Databases that timed out or errored |
doi_info | object? | DOI validation: {doi, valid, title} |
arxiv_info | object? | arXiv validation: {arxiv_id, valid, title} |
retraction_info | object? | Retraction data: {is_retracted, retraction_doi, retraction_source} |
db_results | object[] | Per-database query results |
Skipped Reference Fields
Skipped references include a skip_reason field instead of validation data:
{
"index": 5,
"original_number": 6,
"title": "GitHub repo",
"status": "skipped",
"effective_status": "skipped",
"skip_reason": "url_only",
...
}
Per-DB Result Fields
| Field | Type | Description |
|---|---|---|
db | string | Database name |
status | string | match, no_match, author_mismatch, timeout, rate_limited, error, skipped |
elapsed_ms | number | Query time in milliseconds |
authors | string[] | Authors returned (if found) |
url | string? | Paper URL in this database |
CSV Schema
One row per reference, with these columns:
Filename,Verdict,Ref#,Title,Status,EffectiveStatus,FpReason,Source,Retracted,Authors,FoundAuthors,PaperURL,DOI,ArxivID,FailedDBs
Multi-value fields (Authors, FoundAuthors, FailedDBs) use semicolons as separators within the CSV field.
Markdown Structure
# Hallucinator Results
## paper.pdf [SAFE]
**42** references | **38** verified | **3** not found | ...
### Problematic References
**[7]** Suspicious Paper Title — ✗ Not Found
- [Google Scholar](...)
### Verified References
| # | Title | Source | URL |
|---|-------|--------|-----|
| 1 | Attention Is All You Need | CrossRef | [link](...) |
### Skipped References
| # | Title | Reason |
|---|-------|--------|
| 6 | GitHub repo | URL-only |
Sections are only included if they contain references (no empty “Problematic References” heading when everything is verified).
Text Format
Plain-text with fixed-width formatting:
Hallucinator Results
============================================================
paper.pdf [SAFE]
-----------------
42 total | 38 verified | 3 not found | 1 mismatch | 0 retracted | 5 skipped | 10.8% problematic
[1] Attention Is All You Need - Verified (CrossRef)
Authors (PDF): A. Vaswani, N. Shazeer
DOI: 10.5555/3295222.3295349 (valid)
URL: https://doi.org/...
[7] Suspicious Paper Title - NOT FOUND
Authors (PDF): J. Doe, A. Smith
Timed out: Semantic Scholar, Europe PMC
HTML Format
A self-contained HTML file with:
- Dark theme with CSS variables
- Stat cards showing totals across all papers
- Collapsible per-paper sections
- Color-coded badges (green: verified, red: not found, yellow: mismatch, dark red: retracted)
- Author comparison grid for mismatches
- Retraction warning boxes
- Google Scholar and paper URL links
- Raw citation in expandable details blocks
- Timestamp in footer
The HTML requires no external dependencies — all CSS is inlined.