To know more about our contributions over the years see the timeline below!
At AI4Bharat, our commitment to Automatic Speech Recognition (ASR) is driven by a vision of embracing and reflecting India's rich linguistic and cultural diversity. We are dedicated to creating inclusive ASR systems that span all 22 constitutionally recognized languages. Our approach combines cutting-edge engineering techniques for large-scale data crawling with meticulous ground-level data collection across over 400 districts, resulting in a dataset of unprecedented magnitude. This includes 300,000 hours of raw speech, 6,000 hours of transcribed data, and 6,400 hours of mined audio-text pairs, augmented by pseudo-labeled data from diverse sources like YouTube. This extensive dataset empowers us to address the complexities of India's linguistic landscape effectively. Our focus on building robust benchmarks is exemplified by our work with Vistaar, IndicSUPERB, Lahaja, and Svarah, which have set new standards in ASR evaluation. Our state-of-the-art models include IndicWav2Vec, IndicWhisper, and IndicConformer, with our latest model supporting all 22 languages and demonstrating our commitment to technological excellence. Moving forward, we aim to enhance our models to handle 8KZ telephony data, adapt them for specific domains and demographics through synthetic data generation, and ensure their functionality in offline settings, further advancing the frontiers of ASR technology for low-resource languages.