OpenAI debuts IndQA, a benchmark rooted in India’s languages and cultural context

OpenAI has launched IndQA, a new benchmark aimed at evaluating how effectively AI systems comprehend questions based on India’s diverse languages and cultural context. The launch follows the company’s recent announcement of a year-long ChatGPT Go subscription offer exclusively for users in India. Developed in partnership with 261 domain experts nationwide, IndQA comprises 2,278 questions spanning 12 languages and 10 cultural domains, ranging from literature and everyday life to food, history, and spirituality. Unlike traditional benchmarks like MMMLU and MGSM, IndQA’s questions are natively written, not translated, reflecting the nuances of how people in India think, speak, and ask questions. Speaking to the media at the OpenAI DevDay Exchange in Bengaluru on Tuesday, Srinivas Narayanan, CTO, B2B Applications, OpenAI, commented, “The goal is to ensure our models are right in the nuances that every culture cares about, which is why we invested in this. We want to learn from it and replicate it in other places and expand the program more. Language is something we’ve always cared about, and this move was to deepen that effort and make it more effective to get the cultural nuances right.” Current benchmarks focus on translation or multiple-choice tasks and don’t capture what matters for evaluating an AI system’s language capabilities, which is understanding context, culture, history, among others. In a company blog, the ChatGPT maker said, “If AI is to be useful for everyone, it needs to function effectively across languages and cultures. About 80 percent of people worldwide do not speak English as their primary language, yet most existing benchmarks that measure non-English language capabilities fall short. While the company aims to create similar benchmarks for other languages and regions, India is an obvious starting point. The country has about a billion people who don’t use English as their primary language, 22 official languages, and is ChatGPT’s second-largest market. This work is part of OpenAI’s commitment to improve its products and tools for Indian users and to make its technology more accessible throughout the country. IndQA covers a range of culturally relevant topics like Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation. Items are written natively in Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil. Each datapoint includes a culturally grounded prompt in an Indian language, an English translation for auditability, rubric criteria for grading, and an ideal answer that reflects expert expectations. Rubric-based approach IndQA uses a rubric-based approach, where each response is graded against criteria written by domain experts for that specific question. The criteria spell out what an ideal answer should include or avoid, and each one is given a weighted point value based on its importance. A model-based grader checks whether each criterion is met. The final score is the sum of the points for criteria satisfied out of the total possible. To build IndQA, OpenAI worked with partners to find experts in India across 10 different domains who drafted difficult, reasoning‑focused prompts tied to their regions and specialties. These experts are native‑level speakers of the relevant language and English. Each question was then tested against OpenAI’s strongest models at the time of their creation: GPT‑4o, OpenAI o3, GPT‑4.5, and GPT‑5. The company kept only those questions where a majority of these models failed to produce acceptable answers, preserving headroom for progress. Published on November 4, 2025

OpenAI debuts IndQA, a benchmark rooted in India’s languages and cultural context

Guess You Like