IndicBERT v2

IndicBERT v2 is a multilingual BERT model trained on IndicCorpv2, covering 24 Indic languages. IndicBERT performs competitive to strong baselines and performs best on 7 out of 9 tasks on IndicXTREME benchmark.


Dataset Download Link
IndicBERT HF Model
IndicBERT+Samanatar HF Model
IndicBERT+Back Trans. HF Model

Coming Soon

Hugginface Examples



  doi = {10.48550/ARXIV.2212.05409},
  url = {},
  author = {Doddapaneni, Sumanth and Aralikatte, Rahul and Ramesh, Gowtham and Goyal, Shreya and Khapra, Mitesh M. and Kunchukuttan, Anoop and Kumar, Pratyush},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages},
  publisher = {arXiv},
  year = {2022}, 
  copyright = { perpetual, non-exclusive license}