Open source datasets (Samanantar and BPCC) and models (IndicTrans and IndicTrans2) for neural machine translation between English and 22 indic languages.
Open-source datasets and benchmarks (Aksharantar), models (IndicXlit), and applications for transliteration between Roman and scripts for 20+ Indic languages.
Open-source models (IndicWav2Vec) for speech recognition in 9 Indian languages.
Open-source language models (IndicBERT), benchmarks (IndicGLUE), and entity recognizers (IndicNER) for 10 Indian languages.
Open-source language generation model (IndicBART) and benchmarks (IndicNLG Suite) for 10 Indian languages.
Open-source datasets (INCLUDE, SignCorpus) and models (OpenHands) for sign recognition for various 10 sign languages from around the world.
Open-source text-to-speech models for 13 Indian languages with support for female and male speakers.
Open source workbench for AI-assisted language data curation work on Indian languages with focus on different annotation flows on Text, Speech, Images, Conversations data types.
Open-source tool for AI-assisted video subtitling and translating with a focus on educational and media content.
Open-sourced tool for document-level translation with NMT and transliteration support.