Signs of the times

SWETA AKUNDI

May 2023
from Shaastra :: vol 02 issue 03 :: May - Jun 2023

Envision co-founders Karthik Kannan (right) and Karthik Mahadevan.

The world of assistive technology grows along with artificial intelligence. And the more human functions are automated, the less limiting are physical disabilities.

The Google Glass was never meant to be. Touted as a breakthrough wearable in 2013, the Glass promised to be your guide — projecting the information you needed on its screen and responding to your voice. But it failed to take off, and was finally declared dead by the search engine giant earlier this year.

Though its sales were stopped for consumers, Glass did offer some test units to a then-fledgling start-up called Envision, an assistive tech company for people with vision impairments.

From his office in The Hague, Karthik Kannan recalls how his company had won a Google Play award for the best accessibility app in 2019. "Our app was geared towards reading the environment, identifying objects, and reading text," he says.

Envision, founded by Karthik Kannan and Karthik Mahadevan, had the software, but needed hardware. The Glass seemed like a good idea. Today, its clients use its smart glasses to access everyday visual information relayed to them orally.

The world of assistive technology grows parallelly with artificial intelligence (AI). The past decade has accelerated growth in computer vision, machine learning (ML) and natural language processing (NLP). The more these inherently human functions — to see, understand, converse — are automated, the less limiting are physical disabilities.

To help people with disabilities in real time, ML should be able to see and understand an unpredictable, ever-changing environment. The key here is to overcome the bottleneck of limited datasets to train the system, Karthik Kannan says.

"Over the past few months, we have seen a lot of development in zero-shot learning and one-shot learning," he says, referring to ML models that can understand what they have not seen before, thus eliminating the need for vast sets of labelled data. Zero-shot models use auxiliary information such as captions for pictures to recognise the similarities and differences in pairs of pictures and use it to infer new data.

Research around visual question answering (VQA) also helps the team. AI for the visually impaired needs to respond to a text-based question built on an input image. VQA is part of a rising frontier in AI: Multimodal AI, which integrates multiple modalities such as language, images, video, audio, graphs and other structured data.