Concurrency Model

Hallucinator’s validation engine is designed around a per-DB drainer pool architecture that maximizes throughput while respecting per-database rate limits. This document explains the concurrency primitives, task structure, and how they interact.

Design Goals

Maximize parallelism — Check multiple references simultaneously
Respect rate limits — Each database has its own rate limit; never exceed it
Minimize latency — Return results as soon as a verified match is found
Avoid contention — No shared rate limiter governor across tasks

Architecture Diagram

                         ┌──────────────────┐
                         │   Job Queue      │
                         │ (async_channel)  │
                         └────────┬─────────┘
                                  │
             ┌────────────────────┼────────────────────┐
             │                    │                    │
      ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐
      │ Coordinator │     │ Coordinator │     │ Coordinator │
      │   Task 1    │     │   Task 2    │     │   Task N    │
      └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
             │                    │                    │
             │  Local DBs inline  │                    │
             │  Cache pre-check   │                    │
             │                    │                    │
             └──────────┬─────────┘                    │
                        │ Fan out (cache misses only)  │
         ┌──────────────┼──────────────────────────┐   │
         │              │              │           │   │
   ┌─────▼─────┐ ┌──────▼─────┐ ┌─────▼────┐ ┌───▼───▼──┐
   │ CrossRef  │ │   arXiv    │ │   DBLP   │ │   ...    │
   │ Drainer   │ │  Drainer   │ │ Drainer  │ │ Drainers │
   │           │ │            │ │ (online) │ │          │
   │ Rate: 1/s │ │ Rate: 3/s  │ │ Rate:1/s │ │          │
   └───────────┘ └────────────┘ └──────────┘ └──────────┘

Task Types

Coordinator Tasks

Count: Configurable via num_workers (default: 4)
Role: Pick references from the shared job queue, run local DBs inline, pre-check cache, fan out to drainers
Concurrency: Multiple coordinators run in parallel, each pulling from the same async_channel

A coordinator’s lifecycle for each reference:

Receive RefJob from job queue
Emit ProgressEvent::Checking
Query local databases inline (DBLP offline, ACL offline) — sub-millisecond
If verified locally → emit result, skip remote phase
Pre-check cache for all remote DBs (synchronous, prevents race condition)
If verified from cache → emit result, skip drainers
Create RefCollector (shared aggregation hub)
Send DrainerJob to each cache-miss DB’s drainer queue

Drainer Tasks

Count: One per enabled remote database
Role: Process DB queries sequentially at the database’s natural rate
Rate limiting: Each drainer is the sole consumer of its DB’s AdaptiveDbLimiter

A drainer’s lifecycle for each job:

Check early-exit conditions (cancelled, already verified, no DOI for DOI-requiring backend)
Acquire rate limiter token
Check cache (within the rate-limited query path)
Execute HTTP query with timeout
Validate authors if title found
Update RefCollector state
Decrement remaining counter; if last, finalize the result

RefCollector

A per-reference aggregation hub, shared (via Arc) by all drainers working on that reference:

RefCollector
├── remaining: AtomicUsize    # Drainers left to report
├── verified: AtomicBool      # Early-exit flag
├── state: Mutex<AggState>    # Aggregation (held briefly)
│   ├── verified_info
│   ├── first_mismatch
│   ├── failed_dbs
│   ├── db_results
│   └── retraction
└── result_tx: Mutex<Option<oneshot::Sender>>

The last drainer to decrement remaining to zero calls finalize_collector(), which builds the final ValidationResult and sends it on the oneshot channel.

Concurrency Primitives

Primitive	Purpose
`async_channel::unbounded`	Job queue (coordinators) and per-DB drainer queues
`AtomicUsize` + `Ordering::AcqRel`	`remaining` counter for lock-free drainer coordination
`AtomicBool` + `Ordering::Release/Acquire`	`verified` flag for early exit
`Mutex<AggState>`	Per-reference aggregation state (single mutex, held briefly)
`tokio::sync::oneshot`	Return channel for each reference’s result
`CancellationToken`	Graceful shutdown (Ctrl+C handler)
`ArcSwap`	Atomic governor swapping during adaptive rate limit backoff
`DashMap`	Lock-free concurrent L1 cache reads

Cache Pre-Check: Preventing Race Conditions

A subtle race condition exists without the cache pre-check:

Reference R is dispatched to CrossRef (drainer A) and arXiv (drainer B)
Drainer A finishes first: CrossRef has a match → sets verified = true
Drainer B sees verified = true → skips arXiv query entirely
arXiv’s result is never cached for reference R

This means future runs will always miss the arXiv cache for this title.

Solution: Before dispatching to any drainer, the coordinator synchronously checks the cache for all remote DBs. Cache hits are recorded in AggState.db_results, and only cache-miss DBs are dispatched to drainers. This ensures every DB’s cached result is always captured regardless of verification order.

Early Exit

When a drainer verifies a reference:

Sets collector.verified to true (atomic store with Release ordering)
Other drainers check this flag before querying (Acquire ordering)
Drainers that see verified = true emit a Skipped status and decrement remaining

This avoids unnecessary API calls once a match is found.

SearxNG Fallback

If a reference is NotFound after all remote DBs have been checked and SearxNG is configured:

finalize_collector() runs a SearxNG web search as a last-resort fallback
If SearxNG finds the title, the status upgrades from NotFound to Verified (source: “Web Search”)
SearxNG results don’t undergo author validation (web search doesn’t return structured author data)

Shutdown Sequence

User presses Ctrl+C → CancellationToken is cancelled
job_tx (the job queue sender) is closed
Coordinators drain remaining jobs, checking cancel.is_cancelled() at each iteration
Drainers skip remaining jobs when cancelled
When all coordinators finish, they drop their Arc<drainer_txs> clones
Drainer channels close → drainers drain and exit
Pool handle completes

Performance Characteristics

Local DB queries: < 1ms (SQLite FTS5 lookups)
Cache hits: Sub-microsecond (DashMap L1) to ~1ms (SQLite L2)
Remote DB queries: 100ms–10s depending on database and network
Throughput: Scales linearly with num_workers for CPU-bound coordination; drainer throughput is rate-limit-bound per DB
Memory: One RefCollector per in-flight reference (small: a few KB each)

Keyboard shortcuts

Hallucinator Documentation