About the Biblical Lexicon
"For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life." — John 3:16
What is this?
This is a computationally-derived lexicon of Biblical Hebrew, Aramaic, and Greek. Every word occurrence in Scripture has been mapped through a three-stage pipeline:
- Token → Sense. Each of the 448,269 word occurrences is assigned to a specific lexical sense of its lemma, using multilingual translation signatures from 43 parallel Bible translations as the primary evidence signal.
- Sense → Domain. Each of the 20,143 induced senses is classified into a three-level semantic domain hierarchy: 32 macro-domains, 93 Louw-Nida style supersense categories, and 1,236 fine-grained community clusters.
- Article composition. For each of the 14,884 lemmas, a lexicon article is composed from the sense inventory, multilingual glosses, domain classifications, and scripture references.
Methodology
The sense inventory was built using agglomerative clustering on cross-lingual translation features (7.5 million glosses across 43 languages), followed by multiple rounds of LLM-assisted merge review and automated cross-lingual Jaccard merging. The domain hierarchy combines a 32-category macro-domain system with Louw & Nida style semantic categories and community-detection fine clusters.
Data
| Lemmas | 14,884 (Hebrew, Aramaic, Greek) |
| Senses | 20,143 induced senses |
| Gloss records | 346,087 across 17 display languages |
| Word occurrences | 448,269 |
| Macro domains | 32 |
| L-N categories | 93 |
| Fine clusters | 1,236 |
Source
This project is open source. The code, pipeline, and data are available on GitHub.
Built with SvelteKit, Cloudflare Workers, D1, and Rust. Part of the bible.systems family of tools for Bible translation and study.