J.B. Poirson, 1803 — Carte de St. Domingue
RASIN.AI

rasin

/ra.zɛ̃/ Kreyòl n.

Root. Origin. Foundation.

0+ collections0K+ pages0 languages0 years
TRY RASIN.AI →

ArevolutionthatdefeatedNapoleon,endedslavery,anddoubledthesizeoftheUnitedStatesalmostvanishedfromthehistoricalrecord.ThedocumentssurvivedscatteredacrossarchivesinFrance,theUnitedStates,andtheCaribbean.Butfortwocenturies,nothingconnectedthem. Thisisthestoryofwhytheywereseparated,andwhatittooktomakethemspeaktoeachotheragain.

Constitution d'Hayti, 1805
Le Télégraphe, 1825
Carte de la Province Antonine
The Problem

Preserved, but
never connected.

0pages processed
0source collections
0languages searchable

They won. And then the world pretended it hadn't happened. The Haitian historian Michel-Rolph Trouillot called it “an unthinkable history.” The entire intellectual framework of the Enlightenment — the system that produced the Declaration of the Rights of Man — ranked humanity on a ladder with Europeans at the top and Africans at the bottom. Enslaved people defeating Napoleon's army didn't just challenge a political order. It broke the categories through which the Western world understood who could be a political actor, who could wage war, who could govern. For thirteen years, every Western power cycled through denial, minimization, and prediction of collapse — until there was nothing left to deny. Hobsbawm's The Age of Revolutions, 1789–1848 gives it barely a mention. The Penguin Dictionary of Modern History doesn't include an entry for Haiti at all.

The records survived — vast, public, and scattered across institutions on three continents. But most of it is photographs — scanned pages, microfilm transfers, ink fading on paper that spent two centuries in tropical humidity. Each collection lives on its own site, in its own language, behind its own interface. A plantation listed in a French indemnity claim might also appear in a fugitive advertisement and a gazette decree about the same parish — but nothing ties them together. The documents survived. The structure connecting them never existed.

Trouillot wrote that silences enter history at four moments — when sources are made, when archives are assembled, when narratives are constructed, when significance is assigned. The sources survived. The archives were assembled. But for two centuries, no one built the system that makes them legible, searchable, and connected. Rasin is that system — reading every page, unifying 110+ sources into a single index, and matching meaning across languages.

The Pipeline

Seven stages from scan to citation.

Every answer on Rasin traces back through a reproducible pipeline. Each stage is documented at rasin.ai/methodology.

01

Collect

110+sources

Custom scrapers pull from Gallica, LoC, DLOC, Internet Archive

02

Read

280K+pages

docTR OCR trained on historical typefaces and damaged handwriting

03

Chunk

281K+passages

Semantic segmentation preserving context and readability

04

Embed

1024dimensions

BGE-M3 multilingual vectors — meaning, not keywords

05

Extract

20K+entities

GLiNER identifies people, places, events, dates across sources

06

Connect

10relation types

Neo4j graph links people and events across archives

07

Answer

3search paths

Hybrid RAG — vector + keyword + graph, fused and verified

The entire pipeline — inference, embeddings, vector search, knowledge graph — runs on a single machine today. No cloud APIs. No third-party dependencies.

01

Collect

This is Article Premier of the 1805 Constitution of Haiti — the first national constitution to permanently abolish slavery. It sat in the Bibliothèque nationale de France for two centuries, digitized but buried in a catalog of millions.

Every archive has its own API, its own rate limits, its own format. Gallica's IIIF endpoint allows five requests per minute at full resolution — with circuit breakers and 90-second backoffs when it pushes back. Over 65 custom scrapers handle the differences, each tracking provenance back to the original institution.

The full corpus is ~86 GB across 43 source categories. A PostgreSQL queue coordinates parallel downloads with resource-aware batching — heavy sources like Gallica run two workers; lighter APIs run five. Every download is resumable. The collection phase alone took weeks.

Python · httpx · Playwright · circuit breakers · PostgreSQL queue · 65+ scrapers

Constitution d'Hayti, 1805 — Article Premier
CONSTITUTION D'HAYTI, 1805 · BIBLIOTHÈQUE NATIONALE DE FRANCE
OCR processing the Constitution
OCR OUTPUT

Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant de toute autre puissance de l'univers, sous le nom d'Empire d'Hayti.

02

Read

A search engine cannot read a photograph. This page is a scan — aging paper, faded ink, eighteenth-century typefaces that modern software wasn't built for. Until it's converted to text, it's invisible to any search.

Before OCR runs, every image passes through a preprocessing pipeline — deskewing rotated scans, denoising damaged pages, enhancing contrast on faded ink, sharpening text edges blurred by two centuries of tropical storage. Then docTR reads what's left, tracking per-word confidence so pages that fail can be retried automatically.

Multiple GPU workers process the corpus in parallel, coordinated through PostgreSQL row locks — no external queue, no Redis. If a worker hits an out-of-memory error, it halves its batch size and retries. The system recovers without human intervention.

docTR · PyMuPDF · adaptive batching · PostgreSQL coordination · CUDA + MPS

03

Chunk

A 200-page constitution can't be searched as a single block. The chunker splits text at semantic boundaries — paragraph breaks first, then sentence boundaries, then hard token limits as a fallback. Each passage lands between 512 and 1,024 tokens, with 128 tokens of overlap so context is never lost at a split.

Every passage carries its provenance: which document it came from, which page, which section. When an answer cites a passage, the chain traces all the way back to a specific page in a specific archive. The citation chain starts here.

tiktoken · semantic boundary detection · 128-token overlap · provenance metadata

CHUNKED PASSAGES
chunk_001

Art. 1 — Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant...

chunk_002

Art. 2 — L'esclavage est à jamais aboli.

chunk_003

Art. 12 — Aucun blanc, quelle que soit sa nation, ne mettra le pied sur ce territoire, à titre de maître ou de propriétaire...

chunk_004

Art. 14 — Toute acception de couleur parmi les enfans d'une seule et même famille, dont le chef de l'État est le père, devant...

04

Embed

Each passage is converted into a 1024-dimensional representation of its meaning — not its words. A Kreyòl question about abolition and this French decree land in the same region of vector space, even though they share zero vocabulary. Queries are asymmetrically prefixed so the model distinguishes questions from documents, and every vector is stored twice — in Qdrant for low-latency search, in PostgreSQL for durability and crash recovery.

FRENCH · CONSTITUTION 1805

L'esclavage est à jamais aboli.

0.94
KREYÒL · USER QUERY

Ki konstitisyon ki te aboli esklavaj pou tout tan?

Zero shared words. Same meaning. Same vector space.

The language of the question should never limitthe reach of the answer.

Kreyòl is a first-class search language in Rasin today. Native answer generation in Kreyòl is the next milestone — via fine-tuning.

BGE-M3 · 1024-dim · Qdrant HNSW · PostgreSQL backup · multilingual BM25

05

Extract

The constitution names the men who signed it — Christophe, Pétion, Clervaux, Geffrard, Gabart. GLiNER, a zero-shot NER model, reads every passage in the corpus and identifies every person, place, organization, event, date, document, and ship it mentions — seven entity types chosen to stay neutral rather than impose interpretive categories on historical figures.

Names that appear differently across centuries and languages — “Toussaint Louverture” and “Toussaint L'Ouverture” — are resolved to a single canonical identity. The model processes roughly a thousand documents per minute on CPU alone.

GLiNER2 zero-shot · 7 entity types · ~1,000 docs/min · deduplication + resolution

ENTITY EXTRACTION

Nous H. Christophe, Clervaux, Vernet, Gabart, Pétion, Geffrard, Toussaint Brave... en notre nom particulier, qu'en celui du peuple d'Hayti...

PERSONH. Christophe1,847 mentions
PERSONPétion1,203 mentions
PERSONDessalines2,041 mentions
PLACEHayti14,200+ documents
KNOWLEDGE GRAPH
H. ChristopheMENTIONED_IN
Constitution d'Hayti, 1805
Founders Online — Adams correspondence
Le Moniteur Haïtien — Royal decree, 1811
Ardouin — Études sur l'histoire d'Haïti
Madiou — Histoire d'Haïti
H. ChristopheRELATED_TOPétionRELATED_TODessalines
06

Connect

Christophe signed this constitution. He also appears in an American diplomatic dispatch, a Moniteur decree from his own kingdom, and two nineteenth-century histories by Ardouin and Madiou. Five archives that never referenced each other — now linked through one person.

A second stage uses Qwen3 via structured output to extract relationships between entities — who participated in which event, who was located where, who authored which document. Ten relationship types, each with a confidence score and the source text that evidences it.

At search time, the graph doesn't just find documents that match your query — it expands it. Search for “Vodou” and the graph injects related terms like “voduisant” and “Legba” into the text search, surfacing passages that no keyword match alone would find.

Neo4j · Qwen3 + Instructor · 10 relation types · graph-expanded queries

Six steps turn a photograph of a deteriorating page into a node in a multilingual knowledge system. The seventh is where it matters — when someone asks a question.

07

Answer

The Bois Caïman ceremony of August 1791 launched the Haitian Revolution. A Vodou priest named Boukman led the gathering that would ignite thirteen years of war and end with the founding of a nation. Try searching for it.

SEARCH

That question is in Kreyòl. The documents that answer it are in French and English — Ardouin's nineteenth-century history describing the ceremony on the Lenormand de Mézy plantation, a Vodou ethnography recording oral traditions about that night, and C.L.R. James analyzing its significance two centuries later. They sit in different archives, catalogued under different systems. No keyword search connects them.

RESULTS
0.91Ardouin, Études sur l'histoire d'HaïtiFR

Description of the ceremony at Lenormand de Mézy plantation, August 1791

0.87Vodou ethnography (Anna's Archive)FR

Oral tradition recording of the Bois Caïman gathering and Boukman's invocation

0.84C.L.R. James, The Black JacobinsEN

Analysis of Bois Caïman as the catalyst for the general insurrection of August 22

The query is first translated into all four corpus languages by an LLM, then embedded. Vector search and keyword search run in parallel — results merged through reciprocal rank fusion with source diversity caps so no single archive dominates the results. A cross-encoder reranks the top candidates.

Before the answer reaches you, a DeBERTa NLI model checks entailment between every claim and its cited passage. If a citation contradicts or doesn't support its claim, it's flagged. Quote verification confirms that any direct quotes actually appear in the source text. Evidence, not guesses.

Nemotron-3-Nano via TRT-LLM · RRF fusion (k=20) · BGE reranker · DeBERTa NLI · quote verification

Every answer traces back to a specific passage in a specific document. The evidence speaks for itself.

The Sources

What the pipeline reads.

98+ verified collections spanning five centuries and four languages. The full catalog is browsable at rasin.ai/sources.

01

Archives & Digital Collections

Gallica, Library of Congress, DLOC, Internet Archive

31+ BnF documents (1492–1850), 9,000+ DLOC newspaper issues, 552 Island Luminous pages in 3 languages

02

Databases & Structured Records

SlaveVoyages, CNRS Indemnités, Marronnage.info

3,581 Saint-Domingue voyages, 28,356 indemnity claimant records, 22,485 fugitive advertisements

03

Periodicals & Newspapers

Le Moniteur Haïtien, L'Abeille Haytienne, La Gazette Royale

9,411 Moniteur issues (1845–1983), 5,076 Moniteur Universel issues (1789–1810)

04

Primary Sources

Founders Online, Boisrond-Tonnerre, US Senate hearings

1,152 Founders Online documents, 9 Senate hearing transcripts, Kreyòl proclamation of 1793

05

Legal Documents

Constitutions, legal codes, Linstant de Pradine

9 Haitian legal codes, 4 Linstant de Pradine volumes (1804–1876), constitutions from 1801–1889

06

Scholarship & Analysis

80+ monographs, Human Rights Watch, Frederick Douglass

Saint-Rémy, Bellegarde, Ardouin, Madiou, C.L.R. James, 18 HRW reports (1993–2025)

Le Télégraphe, 1825 — Haitian newspaper
LE TÉLÉGRAPHE, 1825
L'Abeille Haytienne — early Haitian press
L'ABEILLE HAYTIENNE
Traité de 1825 — France-Haiti indemnity treaty
TRAITÉ DE 1825

110 collections. 281,000 pages. Four languages. All of it runs on one machine.

Infrastructure

Runs on
a single machine.

A system that connects 280,000 pages across 110+ archives, handles OCR in four languages, and runs multilingual AI search — built and tested on a single NVIDIA DGX Spark. One machine. The entire pipeline, from scanned page to cited answer. Cloud deployment is next.

NVIDIA Inception Program
$0/monthPrototype cost
0 GBUnified memory
0Machine to prove it
Retrieval Performance

Measured, not
promised.

Every claim about retrieval quality is backed by a golden test set — 52 hand-curated queries spanning factual lookups, entity searches, cross-lingual questions, and multi-source synthesis. The numbers below are from the latest stable evaluation run.

78%Recall@10
63%Recall@5
0.59MRR
89%Keyword hit rate
RECALL@10 BY QUERY TYPE
Entity queries93%
Factual queries86%
Cross-lingual72%
Cross-source53%

Cross-source queries — where the answer requires combining evidence from multiple archives — are the hardest category, and the one we are actively improving.

Who This Is For

Built for researchers,
open to everyone.

Rasin is live at rasin.ai.

Researchers

Historians, graduate students, and digital humanists studying the Haitian Revolution, the Atlantic world, or the history of slavery. Every answer is cited back to a specific passage in a specific archive — cross-referenced evidence, not summaries.

The Haitian community

Diaspora families tracing ancestry, cultural organizations preserving heritage, and anyone who wants to search their own history in their own language. Kreyòl is a first-class search language — not an afterthought.

Educators

Teachers and professors building courses on Haitian history, Caribbean studies, or the Age of Revolutions. A single query surfaces primary sources from multiple archives — the kind of cross-referencing that used to take a semester of research.

What Comes Next

Haiti is where
it starts.

The same silence that buried the Haitian Revolution — scattered archives, colonial languages, institutional walls — runs through every history shaped by slavery, colonialism, and diaspora. The infrastructure Rasin builds for Haiti is not specific to Haiti.

The Caribbean

Jamaica, Martinique, Guadeloupe, Cuba — colonial archives in English, French, Spanish, and Dutch that have never been cross-referenced.

Latin America

Brazil's slavery archives, Mexico's Afro-descendant communities, the plantation records of the Spanish colonies — millions of pages in institutional silence.

The Middle Passage

SlaveVoyages documents 36,000+ transatlantic crossings — Rasin already indexes 3,500+ to Saint-Domingue. The same pipeline can connect every port record to the people who were taken.

The Diaspora

Pan-African movements from Accra to Harlem, Négritude in Paris, the Windrush generation in London — scattered across archives on four continents.

The roots of one history are tangled with the roots of many others. The same pipeline that connects documents across Haitian archives can connect them across any archive, any language, any continent.

Every citation traces back to a specific page in a specific archive. Every connection between documents was earned — scraped, read, chunked, embedded, extracted, linked, verified. What took historians a lifetime of cross-referencing now takes a question. The scale of what was hidden is only visible once the infrastructure exists to find it.

contact@studio1804.org