Title: CNS Core: Small: Enabling Privacy-Preserving Routing-on-Context in IoT, funded by NSF under grant CNS-2006998, 10/01/2020 – 09/30/2025

Project Summary:

Over the past 50 years, Internet has evolved from initially a data network that was designed to connect only computers to a global-scale cyber-physical network that connects not only computers but also things in the physical world, such as humans, robots, appliances, medical devices, sensors, and actuators, etc., coined the so-called Internet of Things (IoT). Despite the distinct disparities between connected things and computers and the fact that Internet traffic is shifting from computer data communication to context-oriented IoT applications, surprisingly, the principle model of Internet routing has little change than what it was 50 years ago, when Internet was only used to connect computers. In principle, Internet routing has been defined by the connectivity between nodes, i.e., finding a path that connects a specific source node with a specific destination node, typically identified by some context-independent addresses. Context-oriented IoT applications do not perform well under this traditional routing model, because these applications are typically closely coupled with the physical world through sensing and cognition, and what they are essentially targeting on is the application context, i.e., the physical circumstances that form the interest of the application. Nodes (IoT devices) associated with the context are just data providers who may change from time to time due to network dynamics such as mobility. In these cases, routing traffic based on destination nodes is often inefficient and off the track. As such, the overarching goal of this project is to establish a new routing primitive that supports efficient IoT routing based on the targeted application context, rather than on destination nodes. Because IoT communication is built upon public information transportation infrastructures such as the Internet and the public cellular networks, and the IoT application contexts are often sensitive and private to their users, one indispensable requirement for the proposed routing-on-context is the "secure-by-design" privacy preservation of the context information: these context information should not be shared with / disclosed to the IoT public infrastructure, even though they are used by the infrastructure for route computation.

Project Goals:

This project has the following four goals:

1. Privacy-preserving efficient geographic routing over public infrastructures under an insider-threat model. Under an honest-but-curious insider threat model for the public IoT infrastructures, the project proposes a Hilbert space-filling curve (HC) based space encryption mechanism and a Kademlia-tree based hierarchical HC routing primitive to achieve efficient and privacy-preserving geographic routing based on encrypted destination location.

2. Enhancing HC routing against malicious outsider attacks. The project pursues the HC routing problem under malicious outsider attack models, in particular for the wireless part of IoT, where the outsider attacks are highly realistic due to the open nature of wireless. Exploiting the special properties of HC index, the project proposes Rand-Mix, a lightweight traffic mixer that achieves stronger communication privacy in the face of malicious attackers than a traditional cryptographic mixer such as Chaum-Mix, a popular traffic mixer used in onion routing and TOR.

3. The project also proposes a Bayesian probabilistic classification framework to study the location inference attack by outside attackers, so as to quantify the privacy strength of various proposed HC indices and their privacy-routing efficiency tradeoff under outsider attacks.

4. In addition to the particular location context information, the PI will also study the more general types of contexts (i.e., types of contexts not limited to location) to support privacy-preserving context-aware computing in IoT.

Project Personnel

PI: Tao Shu, Ph.D.

Graduate Students

· Jing Hou

· Li Sun

· Tian Liu

· Xueyang Hu

· Hairuo Xu

· Guan Huang

· Minarul Islam

Project Activities and Results

1. Fault-tolerant privacy-preserving federated learning in IoT systems

With the rapid development of Internet of Things (IoT), federated learning (FL) has been widely used to obtain insights from the data collected from a large number of distributed devices while keeping data privacy. Because there is no need for direct training data transmission, FL well fits into the scenarios where data is sensitive. Although devices do not directly reveal their private data, however, the shared model updates or gradients may unintentionally leak sensitive information about the data on which it was trained. As pointed out by previous studies, using FL scheme alone is insufficient in protecting the privacy, specifically the membership privacy, model privacy, and property privacy. As a result, the use of FL needs to combine with privacy-preserving approaches. Existing privacy-preserving FL mechanisms, such as those based on cryptographic methods or differential privacy (DP) methods, do not fit well for IoT systems, as they have largely ignored the tight resource (e.g., power, bandwidth, computation, etc.) constraints and high model accuracy requirement of IoT systems. For example, the gain of DP in privacy protection is usually at the cost of model accuracy. In addition, the definition of DP naturally fits in the protection of a data sample, but DP may not be an effective approach in protecting data property privacy, as revealed in the literature.

In this research activity, we propose a novel masking method, which has a better property privacy protection than traditional DP scheme, while not sacrificing model accuracy. Compared with the random noises in DP scheme, our masking scheme takes both magnitude and direction into consideration. Specifically, we relate the scale of the mask with the scale of the local model update, preventing introducing excessive or insufficient noises. Moreover, we control the direction of the masks, so that it preserves the direction of descent. We also make use of central limit theorem, in which the aggregation of distributions has a cancel-out effect and forms a more concentrated distribution. Therefore, clients are allowed to add larger scale of noises, and such noises can cancel out when aggregated on the server. With carefully chosen masking scales and direction ranges of masking vectors, the privacy can be well preserved while the global model convergence and accuracy being unaffected.

The following accomplishments have been achieved:

we have proposed a novel masking method, which has a better property privacy protection than traditional DP scheme, while not sacrificing model accuracy. Compared with the random noises in DP scheme, our masking scheme takes both magnitude and direction into consideration. Specifically, we relate the scale of the mask with the scale of the local model update, preventing introducing excessive or insufficient noises. Moreover, we control the direction of the masks, so that it preserves the direction of descent. We also make use of central limit theorem, in which the aggregation of distributions has a cancel-out effect and forms a more concentrated distribution. Therefore, clients are allowed to add larger scale of noises, and such noises can cancel out when aggregated on the server. With carefully chosen masking scales and direction ranges of masking vectors, the privacy can be well preserved while the global model convergence and accuracy being unaffected.

To the best of our knowledge, we are the first to consider both scale and direction of the additive masks to achieve a better FL global model convergence in a perturbation-based privacy preserving approach. We theoretically analyze the convergence performance in both non-convex and convex scenarios. Theoretical proofs show that FL with our masking scheme can converge to the accuracy as a non-private FL. Extensive experiments are conducted based on a real-world dataset to validate our theoretical results and evaluate the effectiveness in protecting property privacy. Our theoretical results are consistent with experimental results. And FL with our proposed masking scheme has a comparable convergence performance with non-private FL, and has a better and more consistent property privacy protection than DP. A paper documenting the above research has been submitted to the IEEE INFOCOM 2022:

Tian Liu, Xueyang Hu, Hairuo Xu, and Tao Shu, “Fault-tolerant privacy-preservation for federated learning in IoT systems,” submitted to IEEE INFOCOM 2022, under review, July 2021.

2. Incentive mechanism for crowdsourcing in IoT with privacy considerations

With the rapid development of IoT, we are witnessing a drastic paradigm shift for data and computing services over the Internet, where services are provided more and more in a distributed fashion (e.g., fog computing and edge computing), rather than from a centralized server. Crowdsourcing is an important method to implement distributed services. By outsourcing works to the crowd, service providers access more easily to diverse labor pool, new ideas and solutions, enhanced efficiency and reduced cost. Among the recent successful practices, numerous service providers outsource tasks such as data sensing, content creation and product design to their own service users, instead of to a less-specific, more public group. Nevertheless, a rational user will free-ride on the contributions of others. The user may also have privacy concerns about contributing its data to the public, which may deter the user from contributing. Therefore, a key challenge of successful crowdsourcing is to engage a sufficient number of contributors under a given economic and privacy constraint. While there have been a large number of studies in the literature focusing on the incentive mechanisms for crowdsourcing, they have largely ignored the privacy concerns of the user and have mainly focused on using financial reward, in the form of money or virtual cash, to encourage more user participation and contribution. The users’ intrinsic demand/desire for better services, including better privacy protections, which could have been exploited as a very powerful tool to foster the engagement of users, is often neglected in existing studies.

In this research, we explicitly explored the users’ dual role (i.e., the interdependence between the service and the users) and examined how the intrinsic and extrinsic rewards together reshape the market shares. Specifically, our research takes into consideration the endogenous nature of service quality and the users heterogeneity in service usage level and privacy concern. We show that the dynamic market system converges to a unique equilibrium under mild conditions. Besides, counter-intuitively, failure to take into account the users’ intrinsic incentive leads to too little extrinsic incentive. Moreover, our results showed how the competition makes a difference in reshaping the markets, which cannot be intuitively or trivially predicted without our model and analysis.

The following accomplishments have been achieved:

We have proposed a two-sided market model with both intra-group and inter-group network externalities in the field of crowdsourcing incentive mechanism design, while other studies of wireless market mainly focus on the traditional one-sided market model. By considering the two-sided externality market model, our work is able to give unique insights regarding the evolution of communities that serve themselves via crowdsouring. Such insights are not possible to obtain by the traditional one-sided market models. To the best of our knowledge, this is the first model to address endogenous user segmentation in crowdsourcing incentive mechanism design with the consideration of endogenous service quality. The modeling of the two-way interdependence between the service and the users, which better fits the practice, is a unique contribution of our work and has not been considered in the literature. Several interesting findings have been made in this research. Firstly, we have shown that, counter-intuitively, failure to take into account the users’ intrinsic incentive leads to too little extrinsic incentive, rather than a higher financial reward as a makeup. Secondly, we characterized how the users’ behaviors dynamically evolve within different ranges of the service price and prove the convergence of the market dynamics under some mild conditions. The nonmonotonic impacts of both service price and reward are captured, which are not addressed in other related studies. Lastly, we showed how the competition makes a difference in reshaping the markets, which cannot be intuitively or trivially predicted without our model and analysis. A journal paper documenting the above research has been submitted to the IEEE Transactions on Services Computing and is under major revision:

Jing Hou, Li Sun, and Tao Shu, “Crowdsourcing to service users: work for yourself and get reward,” submitted to the IEEE Transactions on Services Computing, under major revision, July 2021.

3. The value of traded target information in security games

Ample evidence has confirmed the importance of information in security. While much research on security game has assumed the attackers’ limited capabilities to obtain target information, few studies consider the possibility that the information can be acquired from a data broker, not to mention exploring the attackers’ profit-seeking behaviors in the shrouded underground data-brokage society. This paper studies the role of information in the security problem when the target information is sold by a data broker to multiple attackers. We formulate a novel multi-stage game model to characterize both the cooperative and competitive interactions of the data broker and attackers. The attackers’ competition with correlated purchasing and attacking decisions is modeled as a two-stage stochastic model, and the bargaining process between the data broker and the attackers is analyzed in a Stackelberg game. The study contributes to the literature by exploring the behaviors of the attackers with labor specialization, and providing quantitative measures of information value from an economic perspective.

The following accomplishments have been achieved:

we formulate a novel multi-stage game model to characterize both the cooperative and competitive interactions of the data broker and attackers. Specifically, the attacker competition with correlated purchasing and attacking decisions is modeled as a two-stage stochastic model; and the bargaining process between the data broker and the attackers is analyzed in a Stackelberg game. Both the attackers’ competitive equilibrium solutions and data broker’s optimal pricing strategy are obtained. Our results show that with information trading, the target suffers from larger risks even when the information price is too high to benefit the attackers; and information accuracy is more valuable when the target value is higher. Besides, the competition may weaken the information value to the attackers but benefit the data broker; and the attackers would engage in cooperative purchasing only when the price is not high, which results in larger risk for the target. The study contributes to the literature by characterizing the behaviors of the attackers with labor specialization, and providing quantitative measures of information value from an economic perspective. A journal paper documenting this research has been published in IEEE/ACM Transactions on Networking.

Jing Hou, Li Sun, Tao Shu, and Husheng Li, “The value of traded target information in security games,” IEEE/ACM Transactions on Networking (ToN), vol. 29, no. 4, pp. 1853-1866, Aug. 2021.

4. Data sharing with customizable machine learnability and privacy.

This research is relevant to Goal 4 of the project. With the immense amount of publicly available data online, many companies and research institutes are able to download the online data for free and train the machine learning models that will finally result in products that would enhance our everyday life. While enjoying the advantages of such large amount of free data, people (data providers or data owners) have the concern that their personal data may be crawled without the owner’s consent. This brings out an underlying issue in the context of machine learning: in the current literature and applications, dataset owners (also referred to as “dataset providers” in the following text) can only choose between two extreme decisions – to either share their data entirely, or not share any of their data at all. Another side of this issue is that the privacy of the dataset to be shared is either completely revealed due to the full disclosure of the dataset, which benefits the potential consumers of the dataset (referred to as dataset user/buyer in the following text); or the dataset is not shared at all, which preserves the privacy but impeded the development of new technologies.

In this research, we propose the novel Hide-and-Seek data sharing framework that serves as a middle point between the difficult “to share or not to share” extreme decisions, and provides a “partial share” option based on the consumers’ needs, and hence is able to protect the partial privacy of the dataset providers while sharing enough amount of data needed for the user to train their models at a desired accuracy. Extensive amount of experiments have been conducted on the CIFAR-10, Street View House Number (SVHN), and the CIFAR-100 datasets. Our experimental results verify the effectiveness of the proposed Hide-and-Seek framework. We also show in the experiments that our framework is able to protect data provider’s privacy without changing the visual patterns of the dataset, and therefore, doesn’t affect the regular usage of the data (such as using it as a profile photo). A paper documenting the above research has been published in the IEEE ICCCN 2024 conference.

Hairuo Xu and Tao Shu, “Hide-and-seek: Data sharing with customizable machine learnability and privacy,” Proc. of the 33^rd International Conference on Computer Communications and Networks (IEEE ICCCN 2024), July, 2024.

5. Decentralized federated learning over noisy labels: A majority voting method.

This research activity is relevant to Goal 4 of the project. In particular, recent proliferation of edge devices (e.g., smartphones and Internet-of-Things devices) has led to a massive increase in data generated from distributed clients. Realistically, noisy labels are inevitable in decentralized data ownership due to the need for domain-specific knowledge (e.g., the fine-grained CUB-200 requires ornithologists’ expertise) and the carefulness of annotators. In fact, various studies have shown that noisy labeling, such as misinterpretations and neglecting data points, is a wide-spread commonly-seen issue (or problem) in the data annotation process, affecting almost all large-scale datasets. As different clients have varying annotation skills and knowledge levels, some clients’ datasets have high-quality labels, while others do not. Recent studies revealed that poor label quality can adversely affect many aspects of the model, including generalization, robustness, interpretability, and accuracy. Therefore, how to minimize the detrimental effects of noisy labels, which may be unintentionally generated by workers due to their lack of knowledge or carelessness, so as to retain high-quality training over distributed datasets of diverse label qualities, remains a critical issue for practical federated learning implementation.

This research proposed a three-stage solution called DFLMV (Majority voting based decentralized federated learning) to retain high-quality distributed learning over noisy labels held by distributed data ownership. Specifically, in Stage 1, all clients use traditional DFL to train their local models based on their original local datasets. Clients enter stage 2 when their local models’ loss values become stable. In Stage 2, each client exchanges model parameters with its neighbors and uses each neighbor’s model to infer a label for each data point in its local training dataset. Among all inferred labels of the same data point, using majority voting, the client picks the most common one and uses it as the updated label of the data. In Stage 3, based on their updated dataset, each client runs extra training epochs to finetune its local model obtained from Stage 1. Theoretical analysis was performed to obtain key performance bounds of DFLMV. Extensive experiments conducted on MNIST, Fashion-MNIST, CIFA-10, CIFAR-10N, CIFAR-100N, Clothing1M, and ANIMAL-10N validate the effectiveness of our proposed approach at various noise levels and different data settings in mitigating the adverse effects of noisy labels. A paper documenting the above research has been submitted to the Journal of Machine Learning Research:

Guan Huang and Tao Shu, “Decentralized federated learning over noisy labels: A majority voting method,” submitted to the Journal of Machine Learning Research, Aug. 2024.

Publications

1. Xueyang Hu, Tian Liu, Tao Shu, and Diep Nguyen, “Spoofing detection for LiDAR in autonomous vehicles: A physical-layer approach,” IEEE Internet of Things Journal, vol. 11, no. 11, pp. 20673-20689, June 2024.

2. Hairuo Xu and Tao Shu, “Hide-and-seek: Data sharing with customizable machine learnability and privacy,” accepted by IEEE ICCCN 2024, to appear, May 2024.

3. Hairuo Xu and Tao Shu, “Defending against model poisoning attack in federated learning: A variance-minimization approach,” Journal of Information Security and Applications (Elsevier), vol. 82, May 2024.

4. Hairuo Xu and Tao Shu, “Attack-model-agnostic defense against model poisonings in distributed learning,” Journal of Information Security and Applications (Elsevier), vol. 82, May 2024.

5. Tian Liu, Xueyang Hu, and Tao Shu, “Facilitating early-stage backdoor attacks in federated learning with whole population distribution inference,” IEEE Internet of Things Journal, Vol. 10, no. 12, pp. 10385-10399, June 2023.

6. Tian Liu, Xueyang Hu, Hairuo Xu, Tao Shu, and Diep Nguyen, “High-accuracy low cost privacy-preserving federated learning in IoT systems via adaptive perturbation,” Journal of Information Security and Applications (Elsevier), Vol. 70, no.C, Nov. 2022.

7. Jing Hou, Li Sun, and Tao Shu, “Crowdsourcing to service users: Work for yourself and get reward,” accepted by IEEE Transactions on Services Computing, to appear, Apr. 2022.

8. Jian Chen and Tao Shu, “VL-Watchdog: Visible light spoofing detection with redundant orthogonal coding,” IEEE Internet of Things Journal, Vol. 9, No. 12, pp. 9858-9871, June 2022.

9. Rui Zhu, Tao Shu, and Huirong Fu, “Statistical inference attack against PHY-layer key extraction and countermeasures,” Springer Wireless Networks (WINE), Vol. 27, pp. 4853-4873, Sep. 2021.

10. Jing Hou, Li Sun, Tao Shu, and Husheng Li, “The value of traded target information in security games,” IEEE/ACM Transactions on Networking (ToN), Vol. 29, No. 4, pp. 1853-1866, Aug. 2021.

11. Tian Liu and Tao Shu, “On the security of ANN-based AC state estimation in smart grid,” Computers & Security (Elsevier), Vol. 105, June 2021. 15.

12. Hairuo Xu and Tao Shu, “Attack-model-agnostic defense against model poisonings in distributed learning,” Proc. the 19th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC 2022), Dec. 2022.

13. Tian Liu, Xueyang Hu, and Tao Shu, “Assisting backdoor federated learning with whole population knowledge alignment in mobile edge computing,” Proc. IEEE SECON 2022, Sep. 2022.

14. Jian Chen and Tao Shu, “Spoofing detection for indoor visible light systems with redundant orthogonal encoding,” Proc. IEEE ICC 2021, June 2021.

Educational Activities

1. Part of the research outcomes have been disseminated to the communities of interest via journal and conference publication.

2. This project was introduced to hundreds of high-school students and their parents during the 2024 E-day open-house event at the College of Engineering of Auburn University in Feb. 2024. This helped to foster the high-school students' interests in taking science and technology as their future career.

3. Part of the research outcomes have been integrated with the networking and security courses the PI is teaching at Auburn University, including COMP 4320 (Introduction to Computer Networks), COMP 5320/6320/6326 (Design and Analysis of Computer Networks), and COMP 7370/7376 (Advanced Computer and Network Security).

Broader Impacts

Aiming to become the "digital skin" of our planet, IoT is growing rapidly, with an expected population of over 24 billion connected devices by 2020, and applications penetrating almost every aspect of the society. If successful, the resulting privacy-preserving communication foundation will bring the urgently-needed privacy protection in an efficient way to this mission-critical privacy-sensitive infrastructure and will protect the privacy of millions of IoT users while supporting their efficient usage of the IoT application, making a deep impact on the economy, social well-beings, and national interests. Furthermore, this project also carries out a comprehensive education plan to broaden its impacts, including research integration with curriculum development, recruitment and training of student researchers, and dissemination and outreach to the community, especially to under-represented groups through REU and other related programs.