Title: CNS Core: Small: Enabling
Privacy-Preserving Routing-on-Context in IoT, funded by NSF under grant
CNS-2006998, 10/01/2020 – 09/30/2025
Project Summary:
Over the past 50 years, Internet has evolved from
initially a data network that was designed to connect only computers to a
global-scale cyber-physical network that connects not only computers but also
things in the physical world, such as humans, robots, appliances, medical
devices, sensors, and actuators, etc., coined the so-called Internet of Things
(IoT). Despite the distinct disparities between connected things and computers
and the fact that Internet traffic is shifting from computer data communication
to context-oriented IoT applications, surprisingly, the principle model of
Internet routing has little change than what it was 50 years ago, when Internet
was only used to connect computers. In principle, Internet routing has been
defined by the connectivity between nodes, i.e., finding a path that connects a
specific source node with a specific destination node, typically identified by
some context-independent addresses. Context-oriented IoT applications do not
perform well under this traditional routing model, because these applications
are typically closely coupled with the physical world through sensing and
cognition, and what they are essentially targeting on is the application
context, i.e., the physical circumstances that form the interest of the
application. Nodes (IoT devices) associated with the context are just data
providers who may change from time to time due to network dynamics such as
mobility. In these cases, routing traffic based on destination nodes is often
inefficient and off the track. As such, the overarching goal of this project is
to establish a new routing primitive that supports efficient IoT routing based
on the targeted application context, rather than on destination nodes. Because
IoT communication is built upon public information transportation
infrastructures such as the Internet and the public cellular networks, and the
IoT application contexts are often sensitive and private to their users, one
indispensable requirement for the proposed routing-on-context is the
"secure-by-design" privacy preservation of the context information:
these context information should not be shared with / disclosed to the IoT
public infrastructure, even though they are used by the infrastructure for
route computation.
Project Goals:
This project has the following four goals:
1. Privacy-preserving efficient geographic
routing over public infrastructures under an insider-threat model. Under an
honest-but-curious insider threat model for the public IoT infrastructures, the
project proposes a Hilbert space-filling curve (HC) based space encryption
mechanism and a Kademlia-tree based hierarchical HC routing primitive to
achieve efficient and privacy-preserving geographic routing based on encrypted
destination location.
2. Enhancing HC routing against malicious
outsider attacks. The project pursues the HC routing problem under malicious
outsider attack models, in particular for the wireless part of IoT, where the
outsider attacks are highly realistic due to the open nature of wireless.
Exploiting the special properties of HC index, the project proposes Rand-Mix, a
lightweight traffic mixer that achieves stronger communication privacy in the
face of malicious attackers than a traditional cryptographic mixer such as Chaum-Mix,
a popular traffic mixer used in onion routing and TOR.
3. The project also proposes a Bayesian
probabilistic classification framework to study the location inference attack
by outside attackers, so as to quantify the privacy strength of various
proposed HC indices and their privacy-routing efficiency tradeoff under
outsider attacks.
4. In addition to the particular location
context information, the PI will also study the more general types of contexts
(i.e., types of contexts not limited to location) to support privacy-preserving
context-aware computing in IoT.
Project Personnel
PI: Tao Shu, Ph.D.
Graduate Students
· Jing Hou
· Li Sun
· Tian Liu
· Xueyang Hu
· Hairuo Xu
· Guan Huang
· Minarul Islam
Project Activities and Results
1. Fault-tolerant
privacy-preserving federated learning in IoT systems
With the rapid development of Internet of
Things (IoT), federated learning (FL) has been widely used to obtain insights
from the data collected from a large number of distributed devices while
keeping data privacy. Because there is no need for direct training data
transmission, FL well fits into the scenarios where data is sensitive. Although
devices do not directly reveal their private data, however, the shared model
updates or gradients may unintentionally leak sensitive information about the
data on which it was trained. As pointed out by previous studies, using FL
scheme alone is insufficient in protecting the privacy, specifically the
membership privacy, model privacy, and property privacy. As a result, the use
of FL needs to combine with privacy-preserving approaches. Existing
privacy-preserving FL mechanisms, such as those based on cryptographic methods
or differential privacy (DP) methods, do not fit well for IoT systems, as they
have largely ignored the tight resource (e.g., power, bandwidth, computation,
etc.) constraints and high model accuracy requirement of IoT systems. For
example, the gain of DP in privacy protection is usually at the cost of model
accuracy. In addition, the definition of DP naturally fits in the protection of
a data sample, but DP may not be an effective approach in protecting data
property privacy, as revealed in the literature.
In this research activity, we propose a novel
masking method, which has a better property privacy
protection than traditional DP scheme, while not sacrificing model accuracy.
Compared with the random noises in DP scheme, our masking scheme takes both
magnitude and direction into consideration. Specifically, we relate the scale
of the mask with the scale of the local model update, preventing introducing
excessive or insufficient noises. Moreover, we control the direction of the
masks, so that it preserves the direction of descent. We also make use of
central limit theorem, in which the aggregation of distributions has a
cancel-out effect and forms a more concentrated distribution. Therefore,
clients are allowed to add larger scale of noises, and such noises can cancel
out when aggregated on the server. With carefully chosen masking scales and
direction ranges of masking vectors, the privacy can be well preserved while
the global model convergence and accuracy being unaffected.
The following accomplishments have been
achieved:
we have proposed a novel masking method, which
has a better property privacy protection than traditional DP scheme, while not sacrificing
model accuracy. Compared with the random noises in DP scheme, our masking
scheme takes both magnitude and direction into consideration. Specifically, we
relate the scale of the mask with the scale of the local model update,
preventing introducing excessive or insufficient noises. Moreover, we control
the direction of the masks, so that it preserves the direction of descent. We
also make use of central limit theorem, in which the aggregation of
distributions has a cancel-out effect and forms a more concentrated
distribution. Therefore, clients are allowed to add larger scale of noises, and
such noises can cancel out when aggregated on the server. With carefully chosen
masking scales and direction ranges of masking vectors, the privacy can be well
preserved while the global model convergence and accuracy being unaffected.
To the best of our knowledge, we are the first
to consider both scale and direction of the additive masks to achieve a better
FL global model convergence in a perturbation-based privacy preserving
approach. We theoretically analyze the convergence performance in both
non-convex and convex scenarios. Theoretical proofs show that FL with our
masking scheme can converge to the accuracy as a non-private FL. Extensive
experiments are conducted based on a real-world dataset to validate our
theoretical results and evaluate the effectiveness in protecting property
privacy. Our theoretical results are consistent with experimental results. And
FL with our proposed masking scheme has a comparable convergence performance
with non-private FL, and has a better and more consistent property privacy
protection than DP. A paper documenting the above research has been submitted
to the IEEE INFOCOM 2022:
Tian Liu, Xueyang Hu, Hairuo Xu, and Tao Shu,
“Fault-tolerant privacy-preservation for federated learning in IoT systems,”
submitted to IEEE INFOCOM 2022, under review, July 2021.
2. Incentive mechanism for
crowdsourcing in IoT with privacy considerations
With the rapid development of IoT, we are
witnessing a drastic paradigm shift for data and computing services over the
Internet, where services are provided more and more in a distributed fashion
(e.g., fog computing and edge computing), rather than from a centralized
server. Crowdsourcing is an important method to implement distributed services.
By outsourcing works to the crowd, service providers access more easily to
diverse labor pool, new ideas and solutions, enhanced efficiency and reduced
cost. Among the recent successful practices, numerous service providers
outsource tasks such as data sensing, content creation and product design to
their own service users, instead of to a less-specific, more public group.
Nevertheless, a rational user will free-ride on the contributions of others.
The user may also have privacy concerns about contributing its data to the
public, which may deter the user from contributing. Therefore, a key challenge
of successful crowdsourcing is to engage a sufficient number of contributors
under a given economic and privacy constraint. While there have been a large
number of studies in the literature focusing on the incentive mechanisms for
crowdsourcing, they have largely ignored the privacy concerns of the user and
have mainly focused on using financial reward, in the form of money or virtual
cash, to encourage more user participation and contribution. The users’
intrinsic demand/desire for better services, including better privacy
protections, which could have been exploited as a very powerful tool to foster
the engagement of users, is often neglected in existing studies.
In this research, we explicitly explored the
users’ dual role (i.e., the interdependence between the service and the users)
and examined how the intrinsic and extrinsic rewards together reshape the
market shares. Specifically, our research takes into consideration the
endogenous nature of service quality and the users heterogeneity in service
usage level and privacy concern. We show that the dynamic market system
converges to a unique equilibrium under mild conditions. Besides,
counter-intuitively, failure to take into account the users’ intrinsic
incentive leads to too little extrinsic incentive. Moreover, our results showed
how the competition makes a difference in reshaping the markets, which cannot
be intuitively or trivially predicted without our model and analysis.
The following accomplishments have been
achieved:
We have proposed a two-sided market model with
both intra-group and inter-group network externalities in the field of
crowdsourcing incentive mechanism design, while other studies of wireless
market mainly focus on the traditional one-sided market model. By considering
the two-sided externality market model, our work is able to give unique
insights regarding the evolution of communities that serve themselves via
crowdsouring. Such insights are not possible to obtain by the traditional
one-sided market models. To the best of our knowledge, this is the first model
to address endogenous user segmentation in crowdsourcing incentive mechanism
design with the consideration of endogenous service quality. The modeling of
the two-way interdependence between the service and the users, which better
fits the practice, is a unique contribution of our work and has not been
considered in the literature. Several interesting findings have been made in
this research. Firstly, we have shown that, counter-intuitively, failure to
take into account the users’ intrinsic incentive leads to too little extrinsic
incentive, rather than a higher financial reward as a makeup. Secondly, we
characterized how the users’ behaviors dynamically evolve within different
ranges of the service price and prove the convergence of the market dynamics
under some mild conditions. The nonmonotonic impacts of both service price and
reward are captured, which are not addressed in other related studies. Lastly,
we showed how the competition makes a difference in reshaping the markets,
which cannot be intuitively or trivially predicted without our model and
analysis. A journal paper documenting the above research has been submitted to
the IEEE Transactions on Services Computing and is under major revision:
Jing Hou, Li Sun, and Tao Shu, “Crowdsourcing
to service users: work for yourself and get reward,” submitted to the IEEE
Transactions on Services Computing, under major revision, July 2021.
3. The value of traded target
information in security games
Ample evidence has confirmed the importance of
information in security. While much research on security game has assumed the
attackers’ limited capabilities to obtain target information, few studies
consider the possibility that the information can be acquired from a data
broker, not to mention exploring the attackers’ profit-seeking behaviors in the
shrouded underground data-brokage society. This paper studies the role of
information in the security problem when the target information is sold by a
data broker to multiple attackers. We formulate a novel multi-stage game model
to characterize both the cooperative and competitive interactions of the data
broker and attackers. The attackers’ competition with correlated purchasing and
attacking decisions is modeled as a two-stage stochastic model, and the
bargaining process between the data broker and the attackers is analyzed in a
Stackelberg game. The study contributes to the literature by exploring the
behaviors of the attackers with labor specialization, and providing
quantitative measures of information value from an economic perspective.
The following accomplishments have been
achieved:
we formulate a novel multi-stage game model to
characterize both the cooperative and competitive interactions of the data
broker and attackers. Specifically, the attacker competition with correlated
purchasing and attacking decisions is modeled as a two-stage stochastic model;
and the bargaining process between the data broker and the attackers is
analyzed in a Stackelberg game. Both the attackers’ competitive equilibrium
solutions and data broker’s optimal pricing strategy are obtained. Our results
show that with information trading, the target suffers from larger risks even
when the information price is too high to benefit the attackers; and
information accuracy is more valuable when the target value is higher. Besides,
the competition may weaken the information value to the attackers but benefit
the data broker; and the attackers would engage in cooperative purchasing only
when the price is not high, which results in larger risk for the target. The
study contributes to the literature by characterizing the behaviors of the
attackers with labor specialization, and providing quantitative measures of
information value from an economic perspective. A journal paper documenting
this research has been published in IEEE/ACM Transactions on Networking.
Jing Hou, Li Sun, Tao Shu, and Husheng Li, “The
value of traded target information in security games,” IEEE/ACM Transactions on
Networking (ToN), vol. 29, no. 4, pp. 1853-1866, Aug. 2021.
4. Data sharing with customizable
machine learnability and privacy.
This research is relevant to Goal 4 of the
project. With the immense amount of publicly available data online, many
companies and research institutes are able to download the online data for free
and train the machine learning models that will finally result in products that
would enhance our everyday life. While enjoying the advantages of such large
amount of free data, people (data providers or data owners) have the concern
that their personal data may be crawled without the owner’s consent. This brings
out an underlying issue in the context of machine learning: in the current
literature and applications, dataset owners (also referred to as “dataset
providers” in the following text) can only choose between two extreme decisions
– to either share their data entirely, or not share any of their data at all.
Another side of this issue is that the privacy of the dataset to be shared is
either completely revealed due to the full disclosure of the dataset, which
benefits the potential consumers of the dataset (referred to as dataset
user/buyer in the following text); or the dataset is not shared at all, which
preserves the privacy but impeded the development of new technologies.
In this research, we propose the novel
Hide-and-Seek data sharing framework that serves as a middle point between the
difficult “to share or not to share” extreme decisions, and provides a “partial
share” option based on the consumers’ needs, and hence is able to protect the
partial privacy of the dataset providers while sharing enough amount of data
needed for the user to train their models at a desired accuracy. Extensive
amount of experiments have been conducted on the CIFAR-10, Street View House
Number (SVHN), and the CIFAR-100 datasets. Our experimental results verify the
effectiveness of the proposed Hide-and-Seek framework. We also show in the
experiments that our framework is able to protect data provider’s privacy
without changing the visual patterns of the dataset, and therefore, doesn’t
affect the regular usage of the data (such as using it as a profile photo). A
paper documenting the above research has been published in the IEEE ICCCN 2024
conference.
Hairuo Xu and Tao Shu, “Hide-and-seek: Data
sharing with customizable machine learnability and privacy,” Proc. of the 33rd International
Conference on Computer Communications and Networks (IEEE ICCCN 2024), July,
2024.
5. Decentralized federated learning
over noisy labels: A majority voting method.
This research activity is relevant to Goal 4 of
the project. In particular, recent proliferation of edge devices (e.g.,
smartphones and Internet-of-Things devices) has led to a massive increase in
data generated from distributed clients. Realistically, noisy labels are
inevitable in decentralized data ownership due to the need for domain-specific
knowledge (e.g., the fine-grained CUB-200 requires ornithologists’ expertise)
and the carefulness of annotators. In fact, various studies have shown that
noisy labeling, such as misinterpretations and neglecting data points, is a
wide-spread commonly-seen issue (or problem) in the data annotation process,
affecting almost all large-scale datasets. As different clients have
varying annotation skills and knowledge levels, some clients’ datasets have
high-quality labels, while others do not. Recent studies revealed that poor
label quality can adversely affect many aspects of the model, including
generalization, robustness, interpretability, and accuracy. Therefore, how to
minimize the detrimental effects of noisy labels, which may be unintentionally
generated by workers due to their lack of knowledge or carelessness, so as to
retain high-quality training over distributed datasets of diverse label
qualities, remains a critical issue for practical federated learning
implementation.
This research proposed a three-stage solution
called DFLMV (Majority voting based decentralized federated learning) to retain
high-quality distributed learning over noisy labels held by distributed data
ownership. Specifically, in Stage 1, all clients use traditional DFL to train
their local models based on their original local datasets. Clients enter stage
2 when their local models’ loss values become stable. In Stage 2, each client
exchanges model parameters with its neighbors and uses each neighbor’s model to
infer a label for each data point in its local training dataset. Among all
inferred labels of the same data point, using majority voting, the client picks
the most common one and uses it as the updated label of the data. In Stage 3,
based on their updated dataset, each client runs extra training epochs to
finetune its local model obtained from Stage 1. Theoretical analysis was
performed to obtain key performance bounds of DFLMV. Extensive experiments
conducted on MNIST, Fashion-MNIST, CIFA-10, CIFAR-10N, CIFAR-100N, Clothing1M,
and ANIMAL-10N validate the effectiveness of our proposed approach at various
noise levels and different data settings in mitigating the adverse effects of
noisy labels. A paper documenting the above research has been submitted to the
Journal of Machine Learning Research:
Guan Huang and Tao Shu, “Decentralized
federated learning over noisy labels: A majority voting method,” submitted to
the Journal of Machine Learning Research, Aug. 2024.
Publications
1. Xueyang Hu, Tian Liu, Tao Shu, and Diep Nguyen, “Spoofing
detection for LiDAR in autonomous vehicles: A physical-layer approach,” IEEE
Internet of Things Journal, vol. 11, no. 11, pp. 20673-20689, June 2024.
2. Hairuo Xu and Tao Shu, “Hide-and-seek: Data sharing with
customizable machine learnability and privacy,” accepted by IEEE ICCCN 2024, to
appear, May 2024.
3. Hairuo Xu and Tao Shu, “Defending against model poisoning attack
in federated learning: A variance-minimization approach,” Journal of
Information Security and Applications (Elsevier), vol. 82, May 2024.
4. Hairuo Xu and Tao Shu, “Attack-model-agnostic defense against
model poisonings in distributed learning,” Journal of Information Security and
Applications (Elsevier), vol. 82, May 2024.
5. Tian Liu, Xueyang Hu, and Tao Shu, “Facilitating early-stage
backdoor attacks in federated learning with whole population distribution
inference,” IEEE Internet of Things Journal, Vol. 10, no. 12, pp. 10385-10399,
June 2023.
6. Tian Liu, Xueyang Hu, Hairuo Xu, Tao Shu, and Diep Nguyen,
“High-accuracy low cost privacy-preserving federated learning in IoT systems
via adaptive perturbation,” Journal of Information Security and Applications
(Elsevier), Vol. 70, no.C, Nov. 2022.
7. Jing Hou, Li Sun, and Tao Shu, “Crowdsourcing to service users:
Work for yourself and get reward,” accepted by IEEE Transactions on Services
Computing, to appear, Apr. 2022.
8. Jian Chen and Tao Shu, “VL-Watchdog: Visible light spoofing
detection with redundant orthogonal coding,” IEEE Internet of Things Journal,
Vol. 9, No. 12, pp. 9858-9871, June 2022.
9. Rui Zhu, Tao Shu, and Huirong Fu, “Statistical inference attack
against PHY-layer key extraction and countermeasures,” Springer Wireless
Networks (WINE), Vol. 27, pp. 4853-4873, Sep. 2021.
10. Jing Hou, Li Sun, Tao Shu, and Husheng Li, “The value of traded
target information in security games,” IEEE/ACM Transactions on Networking
(ToN), Vol. 29, No. 4, pp. 1853-1866, Aug. 2021.
11. Tian Liu and Tao Shu, “On the security of ANN-based AC state
estimation in smart grid,” Computers & Security (Elsevier), Vol. 105, June
2021. 15.
12. Hairuo Xu and Tao Shu, “Attack-model-agnostic defense against
model poisonings in distributed learning,” Proc. the 19th IEEE International
Conference on Ubiquitous Intelligence and Computing (UIC 2022), Dec. 2022.
13. Tian Liu, Xueyang Hu, and Tao Shu, “Assisting backdoor federated
learning with whole population knowledge alignment in mobile edge computing,”
Proc. IEEE SECON 2022, Sep. 2022.
14. Jian Chen and Tao Shu, “Spoofing detection for indoor visible
light systems with redundant orthogonal encoding,” Proc. IEEE ICC 2021, June
2021.
Educational Activities
1. Part of the research outcomes have been
disseminated to the communities of interest via journal and conference
publication.
2. This project was introduced to hundreds of
high-school students and their parents during the 2024 E-day open-house event
at the College of Engineering of Auburn University in Feb. 2024. This helped to
foster the high-school students' interests in taking science and technology as
their future career.
3. Part of the research outcomes have been
integrated with the networking and security courses the PI is teaching at
Auburn University, including COMP 4320 (Introduction to Computer Networks),
COMP 5320/6320/6326 (Design and Analysis of Computer Networks), and COMP
7370/7376 (Advanced Computer and Network Security).
Broader Impacts
Aiming to become the "digital skin"
of our planet, IoT is growing rapidly, with an expected population of over 24
billion connected devices by 2020, and applications penetrating almost every
aspect of the society. If successful, the resulting privacy-preserving
communication foundation will bring the urgently-needed privacy protection in
an efficient way to this mission-critical privacy-sensitive infrastructure and
will protect the privacy of millions of IoT users while supporting their efficient
usage of the IoT application, making a deep impact on the economy, social
well-beings, and national interests. Furthermore, this project also carries out
a comprehensive education plan to broaden its impacts, including research
integration with curriculum development, recruitment and training of student
researchers, and dissemination and outreach to the community, especially to
under-represented groups through REU and other related programs.