July 21, 2024

Housing Finance Development

It's Your Housing Finance Development

Data exchange: The third leg of the DPI stool

By Inder Gopal & Satyanarayana Jeedigunta

Digital Public Infrastructure (DPI) has received a great deal of attention around the world in recent months. The prime minister spoke about DPI during his speech to the G20 summit in Indonesia and said, “The benefits of digital transformation should not be confined to a small part of the human race.” As a result, India’s mature DPI assets (which used to be called “India Stack”) are now being offered to countries in the “global south”, creating an opportunity for them to leapfrog the more developed West. This DPI-led export effort has been referred to as India’s “low-cost, software-based version of China’s infrastructure-led Belt and Road Initiative”.

As commonly defined, DPI is comprised of a set of digital platforms and IT systems that serve public good, forming the foundation of the nation’s digital economy. As public highways and railroads provided the enabling infrastructure for the industrial age, DPI is envisaged as the enabling infrastructure for the digital age. Three aspects of the digital economy are part of the foundation. The first aspect, personal identity, is firmly established through Aadhar, which now covers almost all of India’s 1.4Billion people. IIIT- Bangalore launched the Modular Open Source Identity Platform (MOSIP) in 2018 to offer a publicly accessible version of Aadhaar-like technology to other countries. The second aspect is broader, dealing with financial payments and transactions. At the core is Universal Payments Interface (UPI) which has already transformed India’s financial system. UPI is being effectively positioned as an alternative to the baroque legacy mishmash of credit cards and settlement networks that proliferate in the West.

The third aspect of DPI, data exchange, is the least well developed so far. In India’s rapidly growing economy, the sharing of data of public value should be viewed as critical and the creation of the supporting national infrastructure must be a priority. Data Exchange means sharing of data amongst authorized parties, ensuring the data is represented through clearly defined standards, taking into account data protection and privacy, and enforcing relevant data sharing policies and regulations. Data Exchange will let individuals share their personal data with various government agencies, or let government agencies share data between themselves. or allow private sector data to be used by government, or vice versa. Data Exchanges are often specific to a domain (e.g. urban, agriculture, logistics, etc).

A data exchange platform is different from an open data portal as it can control who can get access based on defined policies. It is also different from data sharing platforms common in enterprise settings such as data warehouses and data lakes. Unlike those systems, a data exchange is designed to share data generated in widely distributed systems with no centralised management, ownership or repository of data. In such a system, the disparate data providers wish to retain control and possession of their data, yet share it with parties they choose. This is the typical situation in most public sector situations. Data is generated and stored by various departments, agencies, and private companies, in accordance with their own formats and processes. Some of this data may consists of streams of IoT data from sensors (e.g. weather, air quality, traffic, etc.), some from video sources, some of the data may be demographic or geographical, some may be from tax or property records, some from legal documents or registrations, and some may be historical data from archival sources. Each set of data has its own security and privacy consideration, as well as commercial, monetary or subscription aspects which must be observed.

A major goal of a data exchange is to enable the creation of data-driven applications. The access to high quality data is often the single biggest impediment for an application developer that wishes to create an application to deliver a public service. Start-ups or larger companies spend weeks if not months running from pillar to post, locating pertinent data and persuading the owners to share. Data exchanges will automate that process, reducing months of frustrating toil into a few minutes of work. This will free the application developers to focus on delivering value to citizens and make it easier for start-ups and others to experiment and innovate with new data-driven services.

A data exchange must therefore provide the following capabilities:

  • Discoverability of data: The ability to identify pertinent data with a searchable catalogue is an essential aspect of a data exchange. The catalogue must be programmatically searchable and contain standardised meta-data descriptions.
  • Standardisation of software interfaces and data models: Different data providers represent data in different formats, and this imposes costs and complexity on application developers, driving the need for standardization.
  • Controlling data access: Data providers must have the ability to control who can access a data asset, in part or whole, and restrict access until an agreement is in place or payment is complete.
  • Policy-based and consent-based data sharing: A policy-driven architecture is one where data is shared with a data consumer in a manner consistent with a specified data sharing policy. For personal data, a data exchange will preserve the privacy of such data by ensuring that any personal data is shared only if explicit consent is provided by the concerned individual.
  • Anonymisation and de-identification: Data exchanges must provide tools for data anonymisation to erase or encrypt all personally identifiable markers in a dataset. In addition techniques such as differential privacy must be used to avoid the risk of de-anonymisation through triangulation. In addition to these functional requirements, there are some non-functional requirements which a data exchange designed for DPI must satisfy.
  • Decentralisation & Federation: There must be no requirement to centralise control or storage of data. Each data provider must retain full control and possession of their own data. There may be multiple data exchanges, based on sectoral or application requirements. A catalogue of catalogues can provide a federated single-system view of the set of data exchanges, with the ability to search across them.
  • Open source: The proposed exchange will be designed to be an open-source software system. In general, any DPI must be based on open source, or it will be under the control of a proprietary vendor. The last aspect is essential, and all data exchanges within the country should ideally be based on a common open source code that can be sub-setted and customized for specific deployments or sectors.

The most established Data Exchange is the India Urban Data Exchange (IUDX), which has been deployed in 38 cities (iudx.org.in). In 2019, the Union ministry of housing and urban affairs came together with the Indian Institute of Science, Bangalore to jointly develop and deploy IUDX. In its production quality deployments, IUDX has onboarded many different types of data. The data consists of, IOT streaming data from city sensors, data from city records, legal and property documents, video streams from surveillance cameras and derived “IOT” artefacts from video streams. All the data is non-personal data, but privacy concerns often remain as de-identification risks are always present.

A wide variety of use cases (solid waste management, bus transit, multi-modal transport, flood management, citizen safety, etc.) have also been developed, showing the value of the data to create benefits for citizens and for the city. IUDX code is entirely open-source, based on integration of various upstream open-source projects and on new open-source code. It has been put through a product quality QA process and is consistent with the state-of -the art cyber-security principles. It also adopts modern software architecture principles and is based on a micro-services containerised model. The IUDX code has been validated by MeitY’s STQC agency.

A recent entrant is the Agricultural Data Exchange (ADeX), just launched in Hyderabad by the government of Telangana, together with the Indian Institute of Science and the World Economic Forum. As with IUDX, ADeX purports to connect providers and consumers of data in the agriculture sector in a trusted and efficient manner. Objectives of ADeX are quite similar to those of IUDX, albeit for a different domain. Consequently, it has been possible to re-use much of the IUDX framework to develop ADeX. ADeX is also fully open source and is based on open and published standards. The key difference is that ADeX deals with both non-personal and personal data. Data about farmers and their specific holdings is central to ADeX and consequently protection of personal farmer data is essential.

The objectives of ADeX are to connect the providers and consumers of agricultural information in a consent-based and secure manner and enable them to exchange data. ADeX supports a wide variety of data, including farmer identities and land holdings, soil health, satellite imagery, agri-market data, crop yields, weather data, etc. Some of the key applications developed for ADeX include Smart Farmer Credit, Sharing of Farm Machinery, Various farmer Advisories in areas such as Market Access, Pest Prediction, Soil Health, etc.

The concept of a data exchange is still nascent but it is the essential third leg of the DPI stool. A national push on instantiating such exchanges for different high priority sectors is essential for economic progress. Two examples of working data exchanges. exist and others are in planning. It is hoped that the open source code used in these exchanges can become the basis of the Data Management DPI and become widely deployed in India and across the world.

The authors are respectively, research professor, Indian Institute of Science, and chief advisor, C4IR India, World Economic Forum