Shoonya is an open-source platform to improve the efficiency of language work in Indian languages with AI tools and custom-built UI interfaces and features. This is a key requirement to create larger datasets for training AI models such as neural machine translation for a large number of Indian languages.  

Shoonya has been envisaged as supporting various types of language work including translation, text validation, speech transcription, optical character recognition and so on. The current focus of Shoonya is on translation.

Features supported

Workplace Management

Shoonya provides hierarchical way to manage language work into different organizations, workspaces, and projects.

NMT support

Shoonya enables populating automatic translations from IndicTrans currently supporting 12 Indic languages.

Transliteration Support

Shoonya enables simplified input entry in Roman character with transliteration from IndicXlit models supporting 20+ languages.

Maker-Checker-Superchecker Flow

Shoonya provides multiple ways to evaluate the quality of translated data with automated maker-checker flows.

Context View

Shoonya allows translators to see paragraph level context when translating an individual sentence.

Cross-lingual Support

For low-resource language, Shoonya supports showing annotators translations in other languages.

Bhashini/ULCA compatible datasets

Shoonya has feature to generate the dataset in ULCA format, so that the final reviewed datasets can be directly submitted to Bhashini (National Language Translation Mission initiative).

Defining new Project Types instantly

Shoonya leverages Label Studio, which is an open source library to plug & play the major data labeling types. This gives the flexibility to define a new project type quickly in Shoonya with minimal changes.