How to analyse sensitive data without compromising privacy? ({{commentsTotal}})

Different government institutions possess an increasing amount of data about their citizens, but the idea of keeping it all in one giant database is problematic, as people find the thought of the government knowing too much about them unsettling. The information could, however, provide useful input for statisticians. Liina Kamm, a young Estonian scientist, has led the development of a new tool called Rmind, which helps to cope with huge amounts of personal data, without compromising privacy.

Rmind is a new option for performing statistical analyses of sensitive data, designed to ensure the privacy of individuals and avoid data leaks.

We all provide the government and other bodies with a huge amount of data on a daily basis: every time we swipe a card or make a phone call, we leave a trail of data that is going to be stored somewhere, and there are people interested in using and analyzing it, be it for creating better services, understanding people’s habits, or planning the state’s income from taxes.

A significant amount of this data is sensitive and most of us would not want it fall into the wrong hands. For this reason, consolidating different databases into one centralized database poses a risk to privacy. Kamm aims to solve this issue by introducing a new system where data is cryptographically divided into three parts and stored separately on three computers, so that even the owners of these computers are not capable of understanding the actual values inserted into the database.

Kamm started her research at the Information Security Research Institute of Cybernetica AS, where she closely cooperated with the development team of the secure data analysis system Sharemind, a database and analytics system that works on encrypted data without decrypting it. It was Sharemind that inspired her to delve deeper into database security problems. Today, one of the services that Sharemind provides is a solution on how to calculate whether satellites in space would collide or not, without revealing the coordinates of the satellite trajectories to third parties.

“Data used for statistical analysis is usually stored in databases and a system administrator can go and have a look at the information in there. When we use Sharemind to calculate whether satellites would collide, we don’t actually see the trajectories. So I started wondering if this system could be used in statistical analysis of sensitive information,” Kamm said.

Let’s say one wants to know the average salary of thirty-something men who have higher education. Estonian Ministry of Education and Research has the information about education, while the Estonian Tax and Custom Board has the information regarding salaries. Whereas the information could, in principle, be combined, consolidating it in one database cannot be done due to Personal Data Protection Act and the Taxation Act. Having all that information in one database could allow a system administrator or an analyst to breach privacy and tax secrecy, and identify individual people in the combined data table.

However, with the help of Rmind, data from both sources could be stored in a way that would deny the users conducting the analysis access to the decrypted data itself, meaning that it would be impossible for the users to find out how much, for example, their neighbor earns. Yet, the algorithms built into Rmind can help them find out what is the average salary of a 30-year-old man with higher education. The tool is very similar to standard data analysis tools, allowing statisticians to carry out studies without having to know the details of the underlying cryptographic protocols.

“I am very much interested in the secure analysis of genomic data as this information is very sensitive and involves things people don’t even know about themselves. To conduct statistical analysis on someone without compromising their privacy is a challenge,” Kamm said. “For example, Rmind would allow the Estonian Genome Center to cooperate with other similar institutions who have been unwilling to share data until now. This will open up the possibility of conducting joint research without revealing sensitive patient data to each other or third parties.”

This article was first published on Research in Estonia website.

Editor: S. Tambur, M. Oll