Building the Romanian NLP API that should already exist
LexicRo — open-core Romanian language intelligence infrastructure. Looking for early feedback, collaborators, and anchor users.
If you've ever tried to do anything programmatic with Romanian text — parse a sentence, get the correct inflected form of a noun, check whether a verb conjugation is right — you've probably hit the same wall.
There's no clean API for it. You end up scraping DEXonline, wrestling with verbecc's Romanian support, or just calling a general-purpose LLM and hoping it gets the grammar right. None of that is good enough for production.
pip install. For Romanian, no equivalent exists as a callable REST API.
Romanian NLP tooling currently sits at roughly 15% of what exists for English — and that's being generous. The academic resources are there (DEXonline, RoLEX, the UD Romanian treebank), they're just not packaged in a way that developers can actually use.
That's what LexicRo is for.
What we're building
A hosted REST API — with an open-source core — covering the endpoints Romanian developers actually need:
POST /analyze
→ lemma, POS, case, gender, number, person, tense per token
# Full verb conjugation
GET /conjugate/{verb}
→ all moods and tenses, including perfect simplu and viitor I
# Noun/adjective inflection
GET /inflect/{word}
→ all cases, numbers, genders
# Lexical lookup
GET /lookup/{word}
→ definition, gender, plural, etymology (DEXonline data)
# CEFR difficulty scoring
POST /difficulty
→ A1–C2 level estimate, calibrated to Romanian B1/B2 exams
The core is open source (MIT). The model weights will be CC BY-NC — free for research and non-commercial use, commercial use goes through the hosted API. A generous free tier (1,000 req/day, no credit card) from day one.
Built on what already exists
We're not starting from scratch. The data and models are there — they just need engineering:
Data: DEXonline (313k+ lemmas), RoLEX (330k morphosyntactic entries), UD Romanian Treebank (9k+ annotated sentences), OSCAR Romanian corpus.
Models: Fine-tuning bert-base-romanian-cased-v1 for morphological tagging. verbecc for conjugation (already solid for Romanian, we extend it). ML-predicted conjugation templates for unknown verbs.
Infrastructure: FastAPI, Docker, full OpenAPI spec, Python and JS SDKs.
What we're looking for right now
Interested? Three ways in.
No commitment required at any stage — we're genuinely looking for feedback more than signups right now.