
rasin
/ra.zɛ̃/ Kreyòl n.
Root. Origin. Foundation.
ArevolutionthatdefeatedNapoleon,endedslavery,anddoubledthesizeoftheUnitedStatesalmostvanishedfromthehistoricalrecord.Thedocumentssurvived—scatteredacrossarchivesinFrance,theUnitedStates,andtheCaribbean.Butfortwocenturies,nothingconnectedthem. Thisisthestoryofwhytheywereseparated,andwhatittooktomakethemspeaktoeachotheragain.



Preserved, but
never connected.
They won. And then the world pretended it hadn't happened. The Haitian historian Michel-Rolph Trouillot called it “an unthinkable history.” The entire intellectual framework of the Enlightenment — the system that produced the Declaration of the Rights of Man — ranked humanity on a ladder with Europeans at the top and Africans at the bottom. Enslaved people defeating Napoleon's army didn't just challenge a political order. It broke the categories through which the Western world understood who could be a political actor, who could wage war, who could govern. For thirteen years, every Western power cycled through denial, minimization, and prediction of collapse — until there was nothing left to deny. Hobsbawm's The Age of Revolutions, 1789–1848 gives it barely a mention. The Penguin Dictionary of Modern History doesn't include an entry for Haiti at all.
The records survived — vast, public, and scattered across institutions on three continents. But most of it is photographs — scanned pages, microfilm transfers, ink fading on paper that spent two centuries in tropical humidity. Each collection lives on its own site, in its own language, behind its own interface. A plantation listed in a French indemnity claim might also appear in a fugitive advertisement and a gazette decree about the same parish — but nothing ties them together. The documents survived. The structure connecting them never existed.
Trouillot wrote that silences enter history at four moments — when sources are made, when archives are assembled, when narratives are constructed, when significance is assigned. The sources survived. The archives were assembled. But for two centuries, no one built the system that makes them legible, searchable, and connected. Rasin is that system — reading every page, unifying 110+ sources into a single index, and matching meaning across languages.

Seven stages from scan to citation.
Every answer on Rasin traces back through a reproducible pipeline. Each stage is documented at rasin.ai/methodology.
Collect
Custom scrapers pull from Gallica, LoC, DLOC, Internet Archive
Read
docTR OCR trained on historical typefaces and damaged handwriting
Chunk
Semantic segmentation preserving context and readability
Embed
BGE-M3 multilingual vectors — meaning, not keywords
Extract
GLiNER identifies people, places, events, dates across sources
Connect
Neo4j graph links people and events across archives
Answer
Hybrid RAG — vector + keyword + graph, fused and verified
The entire pipeline — inference, embeddings, vector search, knowledge graph — runs on a single machine today. No cloud APIs. No third-party dependencies.
Collect
This is Article Premier of the 1805 Constitution of Haiti — the first national constitution to permanently abolish slavery. It sat in the Bibliothèque nationale de France for two centuries, digitized but buried in a catalog of millions.
Every archive has its own API, its own rate limits, its own format. Gallica's IIIF endpoint allows five requests per minute at full resolution — with circuit breakers and 90-second backoffs when it pushes back. Over 65 custom scrapers handle the differences, each tracking provenance back to the original institution.
The full corpus is ~86 GB across 43 source categories. A PostgreSQL queue coordinates parallel downloads with resource-aware batching — heavy sources like Gallica run two workers; lighter APIs run five. Every download is resumable. The collection phase alone took weeks.
Python · httpx · Playwright · circuit breakers · PostgreSQL queue · 65+ scrapers


Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant de toute autre puissance de l'univers, sous le nom d'Empire d'Hayti.
Read
A search engine cannot read a photograph. This page is a scan — aging paper, faded ink, eighteenth-century typefaces that modern software wasn't built for. Until it's converted to text, it's invisible to any search.
Before OCR runs, every image passes through a preprocessing pipeline — deskewing rotated scans, denoising damaged pages, enhancing contrast on faded ink, sharpening text edges blurred by two centuries of tropical storage. Then docTR reads what's left, tracking per-word confidence so pages that fail can be retried automatically.
Multiple GPU workers process the corpus in parallel, coordinated through PostgreSQL row locks — no external queue, no Redis. If a worker hits an out-of-memory error, it halves its batch size and retries. The system recovers without human intervention.
docTR · PyMuPDF · adaptive batching · PostgreSQL coordination · CUDA + MPS
Chunk
A 200-page constitution can't be searched as a single block. The chunker splits text at semantic boundaries — paragraph breaks first, then sentence boundaries, then hard token limits as a fallback. Each passage lands between 512 and 1,024 tokens, with 128 tokens of overlap so context is never lost at a split.
Every passage carries its provenance: which document it came from, which page, which section. When an answer cites a passage, the chain traces all the way back to a specific page in a specific archive. The citation chain starts here.
tiktoken · semantic boundary detection · 128-token overlap · provenance metadata
Art. 1 — Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant...
Art. 2 — L'esclavage est à jamais aboli.
Art. 12 — Aucun blanc, quelle que soit sa nation, ne mettra le pied sur ce territoire, à titre de maître ou de propriétaire...
Art. 14 — Toute acception de couleur parmi les enfans d'une seule et même famille, dont le chef de l'État est le père, devant...
Embed
Each passage is converted into a 1024-dimensional representation of its meaning — not its words. A Kreyòl question about abolition and this French decree land in the same region of vector space, even though they share zero vocabulary. Queries are asymmetrically prefixed so the model distinguishes questions from documents, and every vector is stored twice — in Qdrant for low-latency search, in PostgreSQL for durability and crash recovery.
L'esclavage est à jamais aboli.
Ki konstitisyon ki te aboli esklavaj pou tout tan?
Zero shared words. Same meaning. Same vector space.
The language of the question should never limit
the reach of the answer.
Kreyòl is a first-class search language in Rasin today. Native answer generation in Kreyòl is the next milestone — via fine-tuning.
BGE-M3 · 1024-dim · Qdrant HNSW · PostgreSQL backup · multilingual BM25
Extract
The constitution names the men who signed it — Christophe, Pétion, Clervaux, Geffrard, Gabart. GLiNER, a zero-shot NER model, reads every passage in the corpus and identifies every person, place, organization, event, date, document, and ship it mentions — seven entity types chosen to stay neutral rather than impose interpretive categories on historical figures.
Names that appear differently across centuries and languages — “Toussaint Louverture” and “Toussaint L'Ouverture” — are resolved to a single canonical identity. The model processes roughly a thousand documents per minute on CPU alone.
GLiNER2 zero-shot · 7 entity types · ~1,000 docs/min · deduplication + resolution
Nous H. Christophe, Clervaux, Vernet, Gabart, Pétion, Geffrard, Toussaint Brave... en notre nom particulier, qu'en celui du peuple d'Hayti...
Connect
Christophe signed this constitution. He also appears in an American diplomatic dispatch, a Moniteur decree from his own kingdom, and two nineteenth-century histories by Ardouin and Madiou. Five archives that never referenced each other — now linked through one person.
A second stage uses Qwen3 via structured output to extract relationships between entities — who participated in which event, who was located where, who authored which document. Ten relationship types, each with a confidence score and the source text that evidences it.
At search time, the graph doesn't just find documents that match your query — it expands it. Search for “Vodou” and the graph injects related terms like “voduisant” and “Legba” into the text search, surfacing passages that no keyword match alone would find.
Neo4j · Qwen3 + Instructor · 10 relation types · graph-expanded queries
Six steps turn a photograph of a deteriorating page into a node in a multilingual knowledge system. The seventh is where it matters — when someone asks a question.
Answer
The Bois Caïman ceremony of August 1791 launched the Haitian Revolution. A Vodou priest named Boukman led the gathering that would ignite thirteen years of war and end with the founding of a nation. Try searching for it.
That question is in Kreyòl. The documents that answer it are in French and English — Ardouin's nineteenth-century history describing the ceremony on the Lenormand de Mézy plantation, a Vodou ethnography recording oral traditions about that night, and C.L.R. James analyzing its significance two centuries later. They sit in different archives, catalogued under different systems. No keyword search connects them.
Description of the ceremony at Lenormand de Mézy plantation, August 1791
Oral tradition recording of the Bois Caïman gathering and Boukman's invocation
Analysis of Bois Caïman as the catalyst for the general insurrection of August 22
The query is first translated into all four corpus languages by an LLM, then embedded. Vector search and keyword search run in parallel — results merged through reciprocal rank fusion with source diversity caps so no single archive dominates the results. A cross-encoder reranks the top candidates.
Before the answer reaches you, a DeBERTa NLI model checks entailment between every claim and its cited passage. If a citation contradicts or doesn't support its claim, it's flagged. Quote verification confirms that any direct quotes actually appear in the source text. Evidence, not guesses.
Nemotron-3-Nano via TRT-LLM · RRF fusion (k=20) · BGE reranker · DeBERTa NLI · quote verification
Every answer traces back to a specific passage in a specific document. The evidence speaks for itself.
What the pipeline reads.
98+ verified collections spanning five centuries and four languages. The full catalog is browsable at rasin.ai/sources.
Archives & Digital Collections
Gallica, Library of Congress, DLOC, Internet Archive
31+ BnF documents (1492–1850), 9,000+ DLOC newspaper issues, 552 Island Luminous pages in 3 languages
Databases & Structured Records
SlaveVoyages, CNRS Indemnités, Marronnage.info
3,581 Saint-Domingue voyages, 28,356 indemnity claimant records, 22,485 fugitive advertisements
Periodicals & Newspapers
Le Moniteur Haïtien, L'Abeille Haytienne, La Gazette Royale
9,411 Moniteur issues (1845–1983), 5,076 Moniteur Universel issues (1789–1810)
Primary Sources
Founders Online, Boisrond-Tonnerre, US Senate hearings
1,152 Founders Online documents, 9 Senate hearing transcripts, Kreyòl proclamation of 1793
Legal Documents
Constitutions, legal codes, Linstant de Pradine
9 Haitian legal codes, 4 Linstant de Pradine volumes (1804–1876), constitutions from 1801–1889
Scholarship & Analysis
80+ monographs, Human Rights Watch, Frederick Douglass
Saint-Rémy, Bellegarde, Ardouin, Madiou, C.L.R. James, 18 HRW reports (1993–2025)



110 collections. 281,000 pages. Four languages. All of it runs on one machine.
Runs on
a single machine.
A system that connects 280,000 pages across 110+ archives, handles OCR in four languages, and runs multilingual AI search — built and tested on a single NVIDIA DGX Spark. One machine. The entire pipeline, from scanned page to cited answer. Cloud deployment is next.
Measured, not
promised.
Every claim about retrieval quality is backed by a golden test set — 52 hand-curated queries spanning factual lookups, entity searches, cross-lingual questions, and multi-source synthesis. The numbers below are from the latest stable evaluation run.
Cross-source queries — where the answer requires combining evidence from multiple archives — are the hardest category, and the one we are actively improving.
Researchers
Historians, graduate students, and digital humanists studying the Haitian Revolution, the Atlantic world, or the history of slavery. Every answer is cited back to a specific passage in a specific archive — cross-referenced evidence, not summaries.
The Haitian community
Diaspora families tracing ancestry, cultural organizations preserving heritage, and anyone who wants to search their own history in their own language. Kreyòl is a first-class search language — not an afterthought.
Educators
Teachers and professors building courses on Haitian history, Caribbean studies, or the Age of Revolutions. A single query surfaces primary sources from multiple archives — the kind of cross-referencing that used to take a semester of research.

Haiti is where
it starts.
The same silence that buried the Haitian Revolution — scattered archives, colonial languages, institutional walls — runs through every history shaped by slavery, colonialism, and diaspora. The infrastructure Rasin builds for Haiti is not specific to Haiti.
Jamaica, Martinique, Guadeloupe, Cuba — colonial archives in English, French, Spanish, and Dutch that have never been cross-referenced.
Brazil's slavery archives, Mexico's Afro-descendant communities, the plantation records of the Spanish colonies — millions of pages in institutional silence.
SlaveVoyages documents 36,000+ transatlantic crossings — Rasin already indexes 3,500+ to Saint-Domingue. The same pipeline can connect every port record to the people who were taken.
Pan-African movements from Accra to Harlem, Négritude in Paris, the Windrush generation in London — scattered across archives on four continents.
The roots of one history are tangled with the roots of many others. The same pipeline that connects documents across Haitian archives can connect them across any archive, any language, any continent.
Every citation traces back to a specific page in a specific archive. Every connection between documents was earned — scraped, read, chunked, embedded, extracted, linked, verified. What took historians a lifetime of cross-referencing now takes a question. The scale of what was hidden is only visible once the infrastructure exists to find it.

