Artificial-Intelligence For Bhārat


AI4Bharat is a Research Lab at IIT Madras which works on developing open-source datasets, tools, models and applications for Indian languages.

Our Mission

Bring parity with respect to English in AI technologies for Indian languages with open-source contributions in datasets, models, and applications and by enabling an innovation ecosystem

Our Impact Axes


Curate and create the largest public datasets and benchmarks across various tasks and the 22 scheduled languages of India.

AI Models

Build state-of-the-art, open, foundational AI models across tasks and 22 regional Indian languages.


Design and deploy with partners reference applications to demonstrate potential of open AI models.


Enable researchers, startups, and govt. to innovate on Indian language AI tech with educational material and workshops.



Open source datasets (Samanantar and BPCC) and models (IndicTrans and IndicTrans2) for neural machine translation between English and 22 indic languages.


Open-source datasets and benchmarks (Aksharantar), models (IndicXlit), and applications for transliteration between Roman and scripts for 20+ Indic languages.

Speech Recognition

Open-source models (IndicWav2Vec) for speech recognition in 9 Indian languages.

Language Understanding

Open-source language models (IndicBERT), benchmarks (IndicGLUE), and entity recognizers (IndicNER) for 10 Indian languages.

Language Generation

Open-source language generation model (IndicBART) and benchmarks (IndicNLG Suite) for 10 Indian languages.

Sign Language

Open-source datasets (INCLUDE, SignCorpus) and models (OpenHands) for sign recognition for various 10 sign languages from around the world.


Open-source text-to-speech models for 13 Indian languages with support for female and male speakers.



Open source workbench for AI-assisted language data curation work on Indian languages with focus on different annotation flows on Text, Speech, Images, Conversations data types.


Open-source tool for AI-assisted video subtitling and translating with a focus on educational and media content.


Open-sourced tool for document-level translation with NMT and transliteration support.

Our Sponsors

Nandan Nilekani

As the primary sponsor, Nandan Nilekani has generously contributed to the formation of the “Nilekani center at AI4Bharat” with a focus on open-source language tech as a public good.


Microsoft’s Research Lab and India Development Center (IDC) have supported us with unrestricted research grants and also by allocating time from researchers to contribute towards open-source technologies.

EkStep Foundation

We are also supported by EkStep Foundation with mentorship and software engineering to build and deploy open-source applications for Indian languages.


We receive valuable support from Nvidia India, granting us access to their cutting-edge compute resources and researchers. This collaboration enables us to actively contribute to the development of open-source models.

Our Team

Anoop Kunchukuttan

Researcher at Microsoft

Mitesh Khapra

Associate Professor at CSE Department, IIT Madras

Pratyush Kumar

Researcher at Microsoft Research and Adjunct Faculty at IIT Madras

Vivek Raghavan

Chief AI Evangelist, EkStep and Mentor, AI4Bharat

We must be second to none in the application of advanced technologies to the real problems of man and society.

– Vikram Sarabhai