Shoonya is an open-source platform to improve the efficiency of language work in Indian languages with AI tools and custom-built UI interfaces and features. This is a key requirement to create larger datasets for training AI models such as neural machine translation for a large number of Indian languages.
Shoonya has been envisaged as supporting various types of language work including translation, text validation, speech transcription, optical character recognition and so on. The current focus of Shoonya is on translation.
Shoonya provides hierarchical way to manage language work into different organizations, workspaces, and projects.
Shoonya enables populating automatic translations from IndicTrans currently supporting 12 Indic languages.
Shoonya enables simplified input entry in Roman character with transliteration from IndicXlit models supporting 20+ languages.
Shoonya provides multiple ways to evaluate the quality of translated data with automated maker-checker flows.
Shoonya allows translators to see paragraph level context when translating an individual sentence.
For low-resource language, Shoonya supports showing annotators translations in other languages.
Shoonya has feature to generate the dataset in ULCA format, so that the final reviewed datasets can be directly submitted to Bhashini (National Language Translation Mission initiative).
Shoonya leverages Label Studio, which is an open source library to plug & play the major data labeling types. This gives the flexibility to define a new project type quickly in Shoonya with minimal changes.