Marking benchmarks
-
- from Shaastra :: vol 05 issue 06 :: Jun 2026
Researchers rethink AI evaluation to test for real-world deployment.
Postdoctoral researcher Mohammed Safi Ur Rahman Khan is contemplating ways to evaluate artificial intelligence (AI) models within India's cultural context. India, for instance, has seen a surge in UPI and OTP-based scams. Would popular AI models, built in Western societies, still be able to recognise fraud patterns common in India and flag suspicious requests involving Aadhaar details?
At AI4Bharat, a research lab at the Indian Institute of Technology Madras, Safi is part of a team developing IndicLLMSuite, an evaluation suite of benchmarks that tests for safety and capabilities within the Indian context. "We test for the models' capabilities on intents that Indian users care about, from information on applying for licences to government schemes and so on," he says. The suite of benchmarks will be out soon, he adds.
PAST ISSUES - Free to Read
Have a
story idea?
Tell us.
Do you have a recent research paper or an idea for a science/technology-themed article that you'd like to tell us about?
GET IN TOUCH



