Project announcement

Building the Romanian NLP API that should already exist

LexicRo — open-core Romanian language intelligence infrastructure. Looking for early feedback, collaborators, and anchor users.

NLP Romanian API Open Source B1+ tools EdTech

Phase 1 is live

Conjugation, lexical lookup, inflection, and word validation endpoints are now callable. Free tier: 1,000 requests/day, no credit card required.

Interactive docs → API status →

If you've ever tried to do anything programmatic with Romanian text — parse a sentence, get the correct inflected form of a noun, check whether a verb conjugation is right — you've probably hit the same wall.

There's no clean API for it. You end up scraping DEXonline, wrestling with verbecc's Romanian support, or just calling a general-purpose LLM and hoping it gets the grammar right. None of that is good enough for production.

The specific gap

Given an arbitrary Romanian sentence, return for each token: its lemma, part of speech, grammatical case, number, gender, person, and tense. This is what spaCy does for English, French, and German in a pip install. For Romanian, no equivalent exists as a callable REST API.

Romanian NLP tooling currently sits at roughly 15% of what exists for English — and that's being generous. The academic resources are there (DEXonline, RoLEX, the UD Romanian treebank), they're just not packaged in a way that developers can actually use.

That's what LexicRo is for.

What we're building

A hosted REST API — with an open-source core — covering the endpoints Romanian developers actually need:

    # Morphological analysis

    POST /analyze

    → lemma, POS, case, gender, number, person, tense per token

    # Full verb conjugation

    GET /conjugate/{verb}

    → all moods and tenses, including perfect simplu and viitor I

    # Noun/adjective inflection

    GET /inflect/{word}

    → all cases, numbers, genders

    # Lexical lookup

    GET /lookup/{word}

    → definition, gender, plural, etymology (DEXonline data)

    # CEFR difficulty scoring

    POST /difficulty

    → A1–C2 level estimate, calibrated to Romanian B1/B2 exams

The core is open source (MIT). The model weights will be CC BY-NC — free for research and non-commercial use, commercial use goes through the hosted API. A generous free tier (1,000 req/day, no credit card) from day one.

Built on what already exists

We're not starting from scratch. The data and models are there — they just need engineering:

Data: DEXonline (313k+ lemmas), RoLEX (330k morphosyntactic entries), UD Romanian Treebank (9k+ annotated sentences), OSCAR Romanian corpus.

Models: Fine-tuning bert-base-romanian-cased-v1 for morphological tagging. verbecc for conjugation (already solid for Romanian, we extend it). ML-predicted conjugation templates for unknown verbs.

Infrastructure: FastAPI, Docker, full OpenAPI spec, Python and JS SDKs.

Phase 1 — live now

The conjugation and lexical lookup endpoints are live and callable. Try them at api.lexicro.com/docs. The morphological analyser (the hard part) follows in Phase 2.

What we're looking for right now

Honest feedback on the endpoint design Does this cover what you actually need? What's missing? What would make you use this over your current approach?

Early users willing to test the API If you're building something with Romanian text — edtech, content tools, document processing, language learning — we'd like to talk. Early users get priority feature input and free Pro access during the build phase.

Academic and institutional connections We're pursuing EU grant funding (Horizon Europe, CEF Digital, Digital Europe Programme). If you're at a Romanian university or research institution and this is relevant to your work, a conversation — or even a letter of support — makes a meaningful difference.

Anyone who's built adjacent to this problem If you've already scraped DEXonline, built a Romanian spell checker, worked with the UD treebank, or tried to fine-tune anything on Romanian — we'd genuinely like to hear what you learned.

On the business model

Open core, hosted API, freemium tiers. The free tier is real and permanent — not a trial. The commercial tiers fund continued development. If you want to self-host the whole thing, the code is there. We're pursuing EU language equality grants partly because Romanian deserves proper NLP infrastructure regardless of whether the commercial model scales immediately.

Three ways in.

The API is live — try it now, or get in touch if you want to talk about your use case.

Try the API Get API access Get in touch