Open source datasets (Samanantar and BPCC) and models (IndicTrans and IndicTrans2) for neural machine translation between English and 22 indic languages.


Open-source datasets and benchmarks (Aksharantar), models (IndicXlit), and applications for transliteration between Roman and scripts for 20+ Indic languages.

Speech Recognition

Open-source models (IndicWav2Vec) for speech recognition in 9 Indian languages.

Language Understanding

Open-source language models (IndicBERT), benchmarks (IndicGLUE), and entity recognizers (IndicNER) for 10 Indian languages.

Language Generation

Open-source language generation model (IndicBART) and benchmarks (IndicNLG Suite) for 10 Indian languages.

Sign Language

Open-source datasets (INCLUDE, SignCorpus) and models (OpenHands) for sign recognition for various 10 sign languages from around the world.


Open-source text-to-speech models for 13 Indian languages with support for female and male speakers.


Open source workbench for AI-assisted language data curation work on Indian languages with focus on different annotation flows on Text, Speech, Images, Conversations data types.


Open-source tool for AI-assisted video subtitling and translating with a focus on educational and media content.


Open-sourced tool for document-level translation with NMT and transliteration support.