What the COVID data said

ANURAG AGRAWAL

Jan 2022
from Shaastra :: vol 01 issue 01 :: Jan - Feb 2022

Data modelling remains a pillar of pandemic preparedness. Here's how we can improve our ability to extract meaningful information from data.

The past year and a half since the COVID-19 outbreak have been frustrating for people who want to make predictions about events - as opposed to offering post-facto explanations. As I look back with the benefit of hindsight, this is not surprising: deciphering the babble of raw data and rendering it comprehensible was always tough. It required a clear understanding of what we knew, but also the capacity to know what we did not know. The following takeaways may be useful for the future, particularly now that our collective attention is focused on the Omicron variant of the virus.

Veracity of data, even if it is from the government, should be questioned. When most cases are mild and resemble general illnesses, when testing is difficult to access or expensive, and when positive tests lead to difficulties for non-sick people, most cases will go undetected. To detect every case would require a massive public health infrastructure and efforts, and in several countries, including India, such efforts yielded variable results. Test data from regions that are unable or unwilling to make such efforts will not be reliable in assessing the extent of the spread of the virus.

The next step to assess the spread of the infection would be to conduct frequent serosurveys based on small but representative samples of the population. This was a potential area of strength for India. We not only developed indigenous serology tests, but the Indian Council of Medical Research performed national serosurveys in a randomly selected sample of districts, while the Council of Scientific and Industrial Research performed detailed surveys of its national lab network. This data established that the true number of cases was far greater than the known ones. There were wide gaps in case-finding between States. The ratio of estimated cases by serosurveys and known cases by testing was the lowest for Kerala (about 5), but over 50 in Jharkhand. Such data should have been more aggressively obtained, released, and used to improve models.