Special Feature

Genomics Big Data

Wading into the gene pool

REKHA DIXIT

Feb 2025
from Shaastra :: vol 04 issue 01 :: Feb 2025

An ongoing study of India's genetic landscape will help researchers understand disease better.

What makes Indians uniquely Indian? A first-phase analysis of more than half of the 10,000 genomes of Indians, sequenced under the ambitious GenomeIndia project representing 69 population groups, has thrown up 7 million genetic variants not found in any global dataset.

The GenomeIndia project is being undertaken by a consortium of 20 scientific institutions led by the Centre's Department of Biotechnology (DBT). In January 2025, it made the sequenced data available to sections of the public. It invited proposals for translational research to address critical personalised and community-level health issues through developing cost-effective screening and diagnostic tools and drug development, and for identifying population-specific genetic risk factors.
GenomeIndia is creating a 'reference Indian genome' that can capture the novelty of the Indian population.

UNIQUELY COMMON

The initial analysis of around 5,750 samples highlights some of the uniqueness of India's genetic landscape. It showed over 135 million genetic variations, most of which are "rare or ultra-rare", yet at least 11% of these variations are common across different Indian populations.

Given that there are around 4,600 population groups in the country, many of them endogamous, more novel variants are waiting to be discovered. For the project, a group of experts whittled the list to 99 groups, representing four major linguistic groups: Indo-European, Dravidian, Austro-Asiatic and Tibeto-Burman. "The groups would maximally represent the Indian population for genetic studies," says DBT Adviser Suchita Ninawe, who is a member of GenomeIndia's Technical Monitoring and Assessment Committee. The first-phase analysis looks at 69 population groups.

BIG DATA

The sequenced data — in several terabytes — is kept in a repository at the Indian Biological Data Centre, Faridabad. Nearly 20,000 blood samples are stored at the Centre for Brain Research, Bengaluru. "Storing and uploading such big data itself are major accomplishments," says Ninawe. GenomeIndia developed a "double-blind" coding system to protect the privacy of the individuals, with first-level coding at the time of collecting the sample and another at the sequencing stage. Ninawe elaborates, "We were clear that we are creating a national resource, and while it will be in a national repository and managed by the government, it will be available to the public." Privacy and consent regulations were therefore prepared accordingly.

The data can currently be accessed only by researchers in Indian institutes. Protocols for sharing the data with international researchers and private companies will be developed presently.

GENETIC LANDSCAPE

The need for Indian genetic datasets has been felt for long. Researchers note that Indian populations exhibit several different disease development patterns, such as in the prevalence of type 2 diabetes or the onset of cancers. Prevailing genome-based diagnostic tools and treatments use Western datasets (To catch a killer).

In 2019, the Institute of Genomics and Integrative Biology (IGIB), New Delhi, sequenced 1,000 human genomes over six months. "The exercise was to demonstrate the technical skills and ability to sequence big data. Analysis of those genomes showed that 30% of the variants were unique to the subcontinent," recalls Sridhar Sivasubbu, a former IGIB scientist involved in sequencing genomes for both projects. Larger datasets will give a better resolution of the genetic landscape, he explains.

Such a dataset will help scientists better understand disease in India. "The carrier frequency of many diseases is related to gene variants. As we move into the era of precision medicine, we will need to understand the genetic landscape of the population," says Rakesh Mishra, Director, Tata Institute for Genetics and Society, Bengaluru. "The basis for starting any nature of genomic treatment is to first have a database."

The 10,000-genome study focused on healthy representatives. In the next phase, GenomeIndia aims to study Indian populations for diseases. These include rare disorders, cancers, neurological problems and lifestyle-associated diseases such as obesity and diabetes, all of which have genetic links. However, the protocols for Phase II are yet to be developed.

The data is the baseline for developing biomedical applications. However, bio-manufacturing may have to wait a bit. First, the data would need to be studied.

Name

Your Comments

Essays

Why your dog can't face Jasprit Bumrah

Only humans can excel at high-velocity ball sports which place extreme demands on an athlete's sensorimotor system.

K. VIJAYRAGHAVAN Aug 2025

Evolution, Cognition, Physiology

Free to read — Essays

Why your dog can't face Jasprit Bumrah

Only humans can excel at high-velocity ball sports which place extreme demands on an athlete's sensorimotor system.

K. VIJAYRAGHAVAN Aug 2025

Evolution, Cognition, Physiology

From the Editor

Course correction

Rapid growth using AI requires specialised knowledge. That need is now being met by educational institutes.

HARI PULAKKAT Aug 2025

Artificial intelligence, Data Science, Computer science, Education

Your Name

Your Email

Are you an alumnus of IIT Madras?

Yes

Please let us know your

Year of Graduation

Department

Send me updates on new articles on Shaastra

Name

Are you an alumnus of IIT Madras?

Yes

Please let us know your

Year of Graduation

Department

Country of Residence

Educational Profile

Work Profile

Send me updated on new articles on Shaastra

Wading into the gene pool

UNIQUELY COMMON

BIG DATA

GENETIC LANDSCAPE

LEAVE A COMMENT

Other Articles

Why your dog can't face Jasprit Bumrah

Course correction

Other Articles

Outlook positive

Have a story idea? Tell us.

Could you tell us a little more about yourself?

Already given us your details?

Could you tell us a little more about yourself?

Have a
story idea?
Tell us.