Translation – AI4BHĀRAT

Machine Translation

India has 22 constitutionally recognised languages with a collective speaker base of over 1 billion speakers. With increasing digital penetration and the preference for regional language content on the web, a good translation system for Indian languages is a necessity to provide equitable access to information and content. Despite this fundamental need, the accuracy of machine translation (MT) systems to and from Indic languages are poorer compared to those for several European languages. At AI4Bharat, our goal is to bridge this gap by (i) mining cheaper parallel data from the web (ii) manually collecting a small amount of seed data (iii) creating robust India-centric benchmarks and (iv) building efficient multilngual models which exploit the similarity between Indian languages.

Datasets

Know more

Models

Know more

Tools

Know more

Machine Translation

Datasets

Samanantar

Bharat Parallel Corpus Collection (BPCC)

Models

IndicTrans

IndicTrans2

Tools

Shoonya

Our Partners

DesiCrew

Shaastra