Skip to main content
Guest Column

Whose data is it anyway?

  • from Shaastra :: vol 01 issue 05 :: Sep - Oct 2022

Rather than restricting access to health data, we need to see it as part of digital public goods for sanctioned societal use.

The rise of machine learning and artificial intelligence is being driven by the large amounts of easily accessible digital data in a digitalised and connected world. With some exceptions, much of the fundamental computer science, such as deep learning, that underpins current AI has been around for decades. Alongside continuous improvements to hardware, the post-Internet increase in the availability of large volumes of digital data was the real game-changer. Due to its high value, many people have started referring to data as the new oil, and ownership of this 'data-oil' has sparked controversy.

Dr Anurag Agrawal is Dean, Biosciences and Health Research, Trivedi School of Biosciences, Ashoka University.

In most sectors, in the absence of any social contract, data is considered to belong to those who generate data – or pay to generate it. Beyond any compulsory disclosures, the owner of the data has the right to govern access, and to monetise such rights by selling ownership or access to such data. In this context, data is a tradeable commodity, just like oil, which possibly prompted the first invocation of that analogy by Clive Humby in 2006. This view – suggesting extractive exploitation of a trapped fungible asset – is highly superficial when examined in detail. Data, unlike physical commodities, is non-fungible, duplicatable, shareable, and indefinitely usable by multiple people without being consumed. Importantly, when data touches humans, such as in healthcare, the absence of an evident social contract cannot be taken as evidence of the absence of such a contract.


Healthcare is a high-value economic sector that is growing rapidly, and thus high value is being placed on health data. The high sensitivity of health data, from the perspective of privacy, social justice and ethics, makes it obvious that social contracts must exist, even if not explicitly stated. A high degree of social activism in this sector has led to a commonly subscribed view that health data belongs to the patients. It is getting enshrined in governance and law (see the National Health Portal of India, This view, although superficially attractive, is as unfounded in facts as the alternative view of data as belonging to the generator or the payor, when examined through the lens of unwritten social contracts governing health.

In healthcare, the foremost unwritten social contract is one that ensures that data is used in a way that maximally benefits patients' health.

The primary understanding between the healthcare system and the patient, while generating such data irrespective of the payor, is that the data would be used for the benefit of the patient. Patients, if asked, voice a preference for data practices that maximise the health benefits while minimising any risks. While they would certainly prefer to be part of any data dividend that accrues from the monetisation of such data, the foremost unwritten social contract here is one that ensures that data is used in a way that maximally benefits health. Take, for example, a genome sequencing study in a patient who wants health guidance. There is no value of such a study to the individual patient if it is not compared to the data of other patients with known health trajectories. Implicit to the generation of health data is a solidarity of purpose with other people for whom such data is generated – for the data to be used in a manner that benefits the collective, while not harming the individual. Such data solidarity extends in both directions – promoting beneficial use while dissuading misuse.

The concept of solidarity is not only compatible with privacy and protection from harm, it is likely to be more robust than current practices that emphasise consent. The Lancet and Financial Times Commission for governing health futures in a digital world recognised that minimal consent-based architectures, where people are deemed to have permitted the use of their data while hastily checking a box on a pop-up window, are inadequate. Getting that box checked is not a difficult task for the tech giants, and misuse is clearly rampant.


We need a lean but balanced data governance system. Rather than trying to restrict access to health data by complex regulations and consent architectures, we need to see it as part of digital public goods for sanctioned societal use. Building the necessary trust will require technological innovation, such as federated learning systems where algorithms and parameters move, but not the actual data. Deidentification of data is far more difficult than is commonly realised, and access control as the principal security mechanism tends to veer towards either inefficiency or insufficiency. Irrespective of technology, we need a clearer articulation and understanding of stakeholder needs and a society-centred design.

Who owns the data? We all do, but with different rights and responsibilities. Responsible flow of data will benefit all.


Search by Keywords, Topic or Author

© 2024 IIT MADRAS - All rights reserved

Powered by RAGE