Marking benchmarks

SWETA AKUNDI

Jun 2026
from Shaastra :: vol 05 issue 06 :: Jun 2026

Enterprises may want to identify benchmarks most relevant to them.

Researchers rethink AI evaluation to test for real-world deployment.

Postdoctoral researcher Mohammed Safi Ur Rahman Khan is contemplating ways to evaluate artificial intelligence (AI) models within India's cultural context. India, for instance, has seen a surge in UPI and OTP-based scams. Would popular AI models, built in Western societies, still be able to recognise fraud patterns common in India and flag suspicious requests involving Aadhaar details?

At AI4Bharat, a research lab at the Indian Institute of Technology Madras, Safi is part of a team developing IndicLLMSuite, an evaluation suite of benchmarks that tests for safety and capabilities within the Indian context. "We test for the models' capabilities on intents that Indian users care about, from information on applying for licences to government schemes and so on," he says. The suite of benchmarks will be out soon, he adds.