Front Line Research at MEIJI

Hiroaki Kikuchi

Establishment of privacy enhancement technology for the effective use of big data 
-What is “Privacy-Preserving Data Mining”? -

The Japanese government also takes a positive stance on the utilization of big data

The utilization of big data has been drawing attention. The “Declaration to be the World’s Most Advanced IT Nation”, which was recently adopted by the Abe Cabinet, also suggested that big data be used positively to revitalize industry and the economy.

Because the commercial use of big data is expected to occur across organizations, it is an urgent need to adopt new technology that takes privacy protection into consideration and to establish systems and laws governing the use of private data.

In the Committee on Personal Data Technical Working Group organized by the Cabinet Office, in which I participate, specialists from various fields, including network technology, statistics, law, and information security, have discussed and evaluated data utilization from a variety of positions.

In the Committee, we have discussed in detail how securely data can be used from a technological and legal point of view. Our discussions were mainly based on the passenger data captured by Suica (a prepaid railroad pass) that JR East (East Japan Railway Company) had already made externally available. By removing from the data the passenger names, dates and fare rates, we first concluded that such passenger data should not be considered as personal information. However, further examination clarified the risk that an individual could be identified just from the data for several train stops.

Concerning anonymization and encryption technology

Processing overhead often becomes a problem in research. Improving the processing speed of computers is also a future challenge for us.

Technology that enhances the confidentiality of the data, such as by anonymization and encryption, is required in order to use such data securely. In particular, the history of anonymization technology is short, and currently we do not have an established, standard technology.

The technical difficulties of anonymization are driven in part by the diversity of big data. Automobile GPS data, the history of purchases made through online shopping sites, online tweets, etc.: the way of anonymization depends on characteristics of these data items, and this makes it difficult to determine a single method. Another impediment is the subjective views of individuals. Whether or not original data (for example, at which station an individual got off a train) can be considered of importance as a matter of privacy depends on the point of view of the individual person concerned. It is extremely difficult to quantify degrees of subjective views. On the other hand, a variety of anonymized data is collected by companies that buy and analyze such data. The problem is that data considered to be secure by itself may be used to identify individuals when combined with other data.

Establishment of laws and rules calls for urgent attention

Professor Kikuchi helped with the encryption technology for the project by the Ministry of Internal Affairs and Communications to find the people affected by the Great East Japan Earthquake housing situation with whom they could make a contract.

What would happen if the anonymity level is increased? Too much anonymizaton could cause deterioration in the data or a greater margin of error, resulting in data that is of low commercial value. To cite the passenger data again, there is a high risk of identifying passengers from data on a specific behavior (for example, getting on and off at stations that are used by very few passengers). Because of this, several methods of deleting such data have been proposed. However, if we start eliminating this type of data, a significant amount of the data will be lost. We have no other choice but to wait for technological innovations in anonymization, or we certainly need a system to provide information security through the establishment of a law that places strict limits on any use for other purposes and the collation of different sets of data.


Technology called Privacy-Preserving Data Mining

Individuals can be identified by a combination of anonymized data. A technology called “Privacy-Preserving Data Mining (PPDM),” in which I have been involved, is expected to be an effective method of dealing with this problem.

PPDM uses encryption technology to analyze data while protecting the privacy of individuals. For instance, in a substantial experiment jointly conducted with the Chiba Cancer Center, the relative risk of developing cancer among people with Helicobacter pylori in their stomach lining was investigated. In this experiment, sensitive data such as lists of cancer patients and individuals with Helicobacter pylori, were encrypted and compared. This allowed for the calculation of relevancy between a disease and an element estimated to be the cause of the disease, while keeping the names of the subjects confidential. The results of our experiment show that the probability of developing cancer among people with Helicobacter pylori was 9.7 times higher than those without Helicobacter pylori, which roughly matched the experience of the Chiba Cancer Center doctors. Medical information, by its nature, is assumed to be used for epidemiological purposes, and as far as its use for academic purposes is concerned, this private data is relatively easy to obtain and use. However, in general, rules regarding the use of such data have not been established. It will still be some time before we are able to simply buy and analyze such data. But if PPDM comes into practical use, data that companies have been accumulating can be more easily utilized for their businesses while maintaining the privacy of the individuals concerned. Since PPDM is expected to be applied to a variety of data analyses, industry has great attention for its application.

PPDM technology allows for matching a set of encrypted data with other encrypted data. The challenge is that preparing comparable data is time-consuming.

Understanding the possibilities and dangers of big data

Anonymized data allows for the analysis of causation and/or correlation, leaving personal information untouched.

As the government establishes laws and systems, it is necessary to understand the contents well. At the same time, it is important for us to thoroughly explain to lawyers and legal experts what can and cannot be done with anonymization and/or encryption technology, as well as the possible risks associated with such technology.

What concerns me most is that big data will be used without correctly understanding the benefits and risks, and only the technology will make rapid progress. As shown in recent stalker cases, even passenger train data may include vital information that can be traced to an individual. General network users will also be required to understand and follow minimum rules and acquire literacy in the characteristics of cyberspace as they use their smart phones and SNS.


To gain advantage in global competition

As for the utilization of big data, discussion on the kinds of rules to be drawn up is ongoing. However, consuming too much time on this may end up jeopardizing business opportunities. There is no doubt that the field of big data will give Japan the upper hand in international business in the future. To ensure that big data is a source of competitiveness it requires a system in which accumulated data can be circulated and utilized once security is assured, and this should be designed and constructed as soon as possible.

Profile

Professor, Department of Frontier Media Science, School of Interdisciplinary Mathematical Sciences
Research interest: Cryptographic Protocols, Network Security, Privacy Protection, Data Mining

Graduated from the Department of Electronics and Communications, School of Engineering, Meiji University
Worked for Fujitsu Laboratories Ltd.; School of Information and Telecommunications Engineering, Tokai University as Professor; and School of Computer Science, Carnegie Mellon University as a Visiting Researcher
Professor, Meiji University, 2013
Chair of Information and Communication Systems Security, the Institute of Electronics, Information and Communication Engineers
Ph.D. in Engineering, Meiji University