At AI4Bharat, our mission has always been to build a robust, open-source ecosystem for Indian language AI. This journey has involved creating foundational datasets, models, and evaluation benchmarks. As Large Language Models (LLMs) become the new frontier, we are encountering a familiar challenge — a significant gap in our ability to properly measure their performance for India.
The global AI landscape is now filled with benchmarks and leaderboards, but they remain overwhelmingly English-centric, designed to evaluate models on Western cultural contexts and use cases. This is insufficient as India enters the era of sovereign LLMs.
A model's ability to discuss a topic in perfect English is irrelevant if it fails to understand a farmer in rural Maharashtra, provides a culturally inappropriate response to a user in Sikkim, or cannot parse a Tang-lish query from a student in Tamil Nadu.
To address this, we are proud to announce the Indic LLM-Arena, an initiative by AI4Bharat (IIT Madras), supported by Google Cloud. This platform is a crowd-sourced, human-in-the-loop leaderboard designed to benchmark LLMs on the three pillars that affect the Indian experience: language, context, and safety.
Current leaderboards are a necessary part of the AI ecosystem, but they do not capture the realities of our nation. The gap exists across three critical dimensions:
Evaluation is not merely about translating 22 scheduled languages. It is about understanding the natural, fluid way Indians communicate. This includes code-switching (e.g., Hinglish or Tanglish), where users mix multiple languages in a single sentence. Models trained on 'pure' text often fail at this, yet it is the primary mode of communication for millions.
Example:
"Bhai, woh naya restaurant ka review accha hai kya? Wahan Andhra meals milta hai?"
(A multi-language prompt that standard benchmarks often overlook.)
India is not a monolith. A model that provides a generic, pan-Indian answer may be unhelpful or, worse, incorrect.
Example:
If a user asks for a "good gift for a housewarming," the correct answer is not a "bottle of wine" (common in the West).
A culturally-aware model would suggest mithai, a Ganesha idol, or other appropriate items for a Grah Pravesh.
This extends to countless scenarios — from understanding local festivals and social etiquette to navigating region-specific agricultural, financial, and healthcare queries.
AI safety cannot be one-size-fits-all. A model's safety and fairness filters must be trained to recognize and mitigate harms that are specific to the Indian social fabric, including subtle forms of regional bias, communal misinformation, or caste-based stereotypes. Standard safety benchmarks do not cover this.
We cannot rely on static, automated benchmarks alone. We need a dynamic, human-powered evaluation model.
Inspired by the success of platforms like lmarena, our approach is built on fair, blind, side-by-side comparison:
The Indic LLM-Arena is more than a leaderboard; it is a public utility designed to foster a more competitive and inclusive AI ecosystem.
The Indic LLM-Arena is an open invitation to the entire community to help us define what "good" AI looks like for India. We are eager to see the advancements this new standard will inspire.
We at AI4Bharat are rolling out the Indic LLM-Arena in carefully planned phases:
Phase 1 (currently live):
Support for text-based inputs across multiple Indian languages and code-mix scenarios.
Phase 2:
Expansion to omni models, bringing in vision and audio capabilities so that the Arena addresses image-based, voice-based, and mixed-media interactions.
Phase 3:
Introduction of agentic tasks, such as handling large documents (PDFs), web-search integration, tool-calls, and other advanced workflows.
Other planned features:
We invite your collaboration — whether you build models, consume models, or simply believe in inclusive language AI:
If you're interested in sponsoring or collaborating, we'd love to hear from you.
The Indic LLM-Arena is an open invitation to the entire community to help us define what "good" AI looks like for India.
Reach out to us: arena@ai4bharat.org