Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data

Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. In this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals’ trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual’s trajectories from the aggregated mobility data without any prior knowledge. Our experiments on two real-world datasets reveal that the attack system is able to recover users’ trajectories with accuracy about 73%~91% at the scale oftens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets and appeals for immediate attentions from both academy and industry.

Motivation

Aggregated mobility data, such as the number of individuals locating in a region in a given time slot, contains statistic information of human mobility, which is of great utility in transportation schduling, business intelligence and epidemic controlling. More importantly, such data is assumed to be safe from privacy breach since only anonymized mobility records can be directly acquired from it, and therefore is referred to as "ash" of original trajectories. However, we prove this assumption to be false by designing an attack system that recovers indvidual's trajectory with high accuracy from "ash".

Trajectories of five individuals over two days.

Attack system

The proposed attack system exploits two fundamental characteristics of human mobility, i.e., regularity and uniqueness, to recover individuals' trajectories from aggregated mobility data. The attack system consists of two steps. First, it estimates the probability that the anonymized mobility records in the neighboring time slot belong to same individuals. Second, it solves the optimal association problem to link the mobility records with highest probability, i.e., recovers the trajectories of individuals.

STEP1:Probability Estimation

(a)Percentage of time spent during nighttime                  (b)CDF of estimation error                    (c)Information gain of merging trajectories

Our system estimates the probability of next locations by exploiting three insights: a) Individual has low mobility during nighttime and spends most of their time in the most frequent locations. b) Human mobility is continual and regular. As a result, next locations can be predicted with high accuracy. c) Different individuals' trajectories are significantly different. Therefore, the attack systems can distinguish the trajectories belonging to the same individuals by computing the information gain of merging them together.

STEP2:Optimal Association

After properly estimating the probability, the remaining problem is achieving an optimal association that links anonymized mobility records together with maximum likelihood, and step by step recover the whole trajectory of each individual. We prove that the optimal association problem is equivalent to Linear Sum Assignment Problem, which has been extensively studied and can be solved in polynomial time with Hungarian algorithm.

Evaluation

We use two real-world mobility datasets to evaluate the privacy leakage in aggregated mobility data. The first dataset is collected from mobile devices by a popular mobile application. It traces over 15,000 mobile users from November 1st to 14th, 2015. The second datasets collected by a major mobile network operator in China, which traces over 100,000 mobile users from April 1st and 7th, 2016.

                    (a)Recovery accuracy                                      (b)Recovery error                              (c)Uniqueness of recovered trajectories

The evaluation results demonstrate that the attack system can accurately recover mobile users' trajectories in terms of: a) 73% recovery accuracy for operator dataset and 91% recovery accuracy for mobile application dataset. b) Only 21% and 8% of the recovered mobility records have a recovery error more than 1 km in mobile operator dataset and application dataset, respectively. In addition, the recovered trajectories are unique and can be reidentified with high probability when little external information is provided. These results reveal that the privacy breach in aggregated mobility data is severe.

                    (a)Spatial resolution                                      (b)Temporal resolution                                        (c)Number of individuals

Further investigation shows that the spatiotemporal resolution and the number of individuals have notable impact on the privacy breach. We find out that reducing spatiotemporal resolution will even slightly increases the recovery accuracy, which indicates more severe privacy leakage. On the other hand, large scale datasets are safer in trajectory recovery attack, since recovery accuracy decreases as the number of individual increases. However, the privacy breach is eminent in the datasets with less than hundreds of thousands of individuals, because the trajectories can be recovered with a rather high accuracy.

Conclusion

In this paper, we identify and evaluate the risks of trajectory recovery attack in the aggregated mobility dataset. To the best of our knowledge, we are the first to recognize and study the privacy problem of inferring individual's information from statistic data. Our investigation reveals that there is serious privacy leakage in the aggregated mobility data since individuals' trajectories can be recovered with high accuracy. In addition, our evaluation demonstrates that the spatiotemporal resolution and scale of datasets have notable impact on the privacy breach. We believe that this work opens a new angle of protecting the privacy in publishing and sharing statistic data, which paves the way to more advanced privacy preserving mechanisms.

Publication

[1]Xu, Fengli, et al. "Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data." Proceedings of the Twenty-Sixth World Wide Web Conference, 2017.

[2]Xu, Fengli, et al. "Context-aware Real-time Population Estimation for Metropolis." Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 2016.

[3]Wang, Huandong, et al. "Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment." Proceedings of the 2015 ACM Conference on Internet Measurement Conference. ACM, 2015.

WWW17 BibTex | Ubicomp16 BibTex | IMC15 BibTex

Contact Person

Teacher

Yong Li

Student

Fengli Xu

Zhen Tu

Assistant Professor, Dept. of Electronic Engineering, Tsinghua University

+86-10-62772387-201
liyong07@tsinghua.edu.cn
Rohm Building 10-202, Tsinghua University, Beijing, 100084, China.