IndicBART is a multilingual, sequence-to-sequence pre-trained model focusing on Indic languages and English. It currently supports 11 Indian languages and is based on the mBART architecture. You can use IndicBART model to build natural language generation applications for Indian languages by finetuning the model with supervised training data for tasks like machine translation, summarization, question generation, etc. Some salient features of the IndicBART are:
You can read more about IndicBART in this paper.
You can download the model and find instructions for model finetuning and decoding in this IndicBART github repo. Alternatively, you may download it from the HuggingFace hub and use it in your own fine-tuning scripts.
If you use IndicBART, please cite the following paper:
@inproceedings{dabre2021indicbart,
title={IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages},
author={Raj Dabre and Himani Shrotriya and Anoop Kunchukuttan and Ratish Puduppully and Mitesh M. Khapra and Pratyush Kumar},
year={2022},
booktitle={Findings of the Association for Computational Linguistics},
}
The IndicBART code (and model) are released under the MIT License.