Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Using Hallucinator as a Rust Library

This guide covers how to use hallucinator crates as dependencies in your own Rust project.

Which Crate to Depend On

Use caseCrateWhat you get
Validate references programmaticallyhallucinator-corecheck_references(), all DB backends, caching, rate limiting
Extract references from PDFshallucinator-parsing + hallucinator-pdf-mupdfReferenceExtractor, section detection, title/author extraction
Parse BBL/BIB fileshallucinator-bblextract_references_from_bbl(), extract_references_from_bib()
Unified file dispatchhallucinator-ingestAuto-detection (PDF/BBL/BIB/archive), streaming archive extraction
Export resultshallucinator-reportingJSON, CSV, Markdown, Text, HTML export
Build offline DBLPhallucinator-dblpbuild_database(), DblpDatabase::search()
Build offline ACLhallucinator-aclbuild_database(), AclDatabase::search()

Most users will want hallucinator-core for validation and hallucinator-ingest for file handling.

Minimal Example: Validate References

use hallucinator_core::{Config, ProgressEvent, RateLimiters, check_references};
use hallucinator_ingest::extract_references;
use std::sync::Arc;
use tokio_util::sync::CancellationToken;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let path = std::path::Path::new("paper.pdf");

    // Extract references
    let extraction = extract_references(path)
        .map_err(|e| anyhow::anyhow!("{}", e))?;

    println!("Found {} references", extraction.references.len());

    // Build config with defaults
    let config = Config {
        rate_limiters: Arc::new(RateLimiters::new(false, false)),
        ..Default::default()
    };

    // Validate
    let cancel = CancellationToken::new();
    let results = check_references(
        extraction.references,
        config,
        |event| {
            if let ProgressEvent::Result { result, .. } = &event {
                println!("[{:?}] {}", result.status, result.title);
            }
        },
        cancel,
    ).await;

    println!("{} total, {} verified, {} not found",
        results.len(),
        results.iter().filter(|r| r.status == hallucinator_core::Status::Verified).count(),
        results.iter().filter(|r| r.status == hallucinator_core::Status::NotFound).count(),
    );

    Ok(())
}

Config Construction

The Config struct controls all runtime behavior:

#![allow(unused)]
fn main() {
use hallucinator_core::{Config, RateLimiters, QueryCache, build_query_cache};
use std::sync::Arc;

let rate_limiters = Arc::new(RateLimiters::new(
    true,  // has_crossref_mailto (enables 3/s instead of 1/s)
    true,  // has_s2_api_key (enables higher S2 rate)
));

let cache = build_query_cache(
    Some(std::path::Path::new("/tmp/cache.db")),
    604800,  // positive TTL: 7 days in seconds
    86400,   // negative TTL: 24 hours in seconds
);

let config = Config {
    openalex_key: Some("your-key".to_string()),
    s2_api_key: Some("your-key".to_string()),
    num_workers: 4,
    db_timeout_secs: 10,
    db_timeout_short_secs: 5,
    max_rate_limit_retries: 3,
    rate_limiters,
    query_cache: Some(cache),
    ..Default::default()
};
}

ProgressEvent Variants

The progress callback receives these events during validation:

EventWhenKey fields
CheckingStarting a referenceindex, total, title
DatabaseQueryCompleteA single DB query finisheddb_name, status, elapsed
RateLimitWaitWaiting for rate limiterdb_name, wait_time
RateLimitRetryRetrying after 429db_name, attempt
WarningDB timeouts for a referencetitle, failed_dbs, message
ResultReference validation completeindex, total, result: Box<ValidationResult>
RetryPassStarting retry pass
RetryingRetrying a referenceindex, title

PDF Extraction

Extract and parse references without validating:

#![allow(unused)]
fn main() {
use hallucinator_core::PdfBackend;
use hallucinator_parsing::ReferenceExtractor;
use hallucinator_pdf_mupdf::MupdfBackend;

let text = MupdfBackend.extract_text(std::path::Path::new("paper.pdf"))?;

// Use ReferenceExtractor for the full pipeline
let extractor = ReferenceExtractor::new(MupdfBackend);
let result = extractor.extract(std::path::Path::new("paper.pdf"))?;

for reference in &result.references {
    println!("Title: {:?}", reference.title);
    println!("Authors: {:?}", reference.authors);
    println!("DOI: {:?}", reference.doi);
}
}

Adding a Custom PDF Backend

Implement PdfBackend (defined in hallucinator-core) to use a different PDF library:

#![allow(unused)]
fn main() {
use hallucinator_core::PdfBackend;

struct MyPdfBackend;

impl PdfBackend for MyPdfBackend {
    fn extract_text(&self, path: &std::path::Path) -> Result<String, String> {
        // Your PDF text extraction logic here
        let text = my_pdf_library::extract(path)
            .map_err(|e| format!("extraction failed: {}", e))?;
        Ok(text)
    }
}
}

Adding a Custom Database Backend

See Database Backends for the DatabaseBackend trait reference and a step-by-step guide.