Finding vulnerabilities in machine learning—and searching for ways to protect against them

IAS - September 2022
Machine learning is all around us, even though we might not realize it. It tells Siri and Alexa what to say and Google Translate what “le ciel est bleu” means in 132 different languages. It advises streaming services on what movies you might like and online advertisers on what shirt you might purchase. It helps robots sort recyclables and humans predict the weather.
The list of applications goes on and on and is only expected to expand as virtually every industry finds ways the technology can make business more efficient and more valuable.
But whether machine learning—broadly defined as computer systems that “learn” by themselves using algorithms and statistical models to analyze data—is something we should fully trust has been a matter of much debate. Crucially, modern machine learning techniques do not provide an explanation to support their output. That is, the algorithm would tell you that it is likely that it sees a recyclable water bottle, but would not be able to tell you why it thinks so.
Would we even notice if machine learning were to lead us astray? Certainly, in many cases, we would. We would notice if computers started misidentifying cat pictures as dog pictures or gave us the weather forecast when we asked for our horoscope. Once detected, such mistakes—whether purposeful or inadvertent—can be fixed with more or better data inputs.
But, according to recent work from Or Zamir, PhD, and colleagues—available on arxiv and scheduled to appear in the proceedings of the FOCS 2022 conference—there is also potential to train machine learning models to misbehave in ways that would be fully undetectable, not only to casual users but also experts trained in machine learning. And no amount of data inputs would fix the problem.
Concerns about machine learning are especially important as it reaches into places that significantly impact our lives, including banking, health care, and the auto industry. Concerns also are growing as more companies use unknown, untrusted entities to train their networks due to lack of computational resources and expertise.
Designing an undetectable backdoor
“People knew that machine learning could be inaccurate if given wrong or inadequate data,” which is a concern on its own, said Zamir, who is currently a postdoctoral fellow at Princeton University and visitor at the Institute for Advanced Study located in Princeton, NJ. “But with this research, we showed something that people didn't consider possible before: that machine learning models could be intentionally designed to be biased in a way that is completely undetectable.”
He and his coauthors—Shafi Goldwasser, PhD, Turing Award winner and Director of the Simons Institute for the Theory of Computing at University of California, Berkeley; Michael P. Kim, PhD, a postdoctoral fellow also at University of California, Berkeley; and Vinod Vaikuntanathan, PhD, recent Godel Prize winner and professor at Massachusetts Institute of Technology—proved this by devising two strategies for planting so-called “backdoors” into machine learning models using what is known about them in cryptography.
In one strategy, they showed how developers could plant a backdoor in any model using digital signature schemes—the mathematical technique used to verify the authenticity of digital messages or documents—that are virtually impossible to forge computationally. In the other, they showed how developers could plant a backdoor in some models (specifically, those trained using a learning paradigm called the Random Fourier Features) that would be impossible to distinguish from the “clean” learning algorithm.
In the paper, they provide a detailed example of how such backdoors could be misused. They explain how a backdoor could slightly—and undetectably—change loan applicants’ information in ways that resulted in inappropriate loan approvals. They imagine that the entity that trained this machine learning network could profit by illicitly selling a service that enables customers to change a few bits of their profile in ways that would guarantee loan approval.
Proving that such undetectable backdoors are possible represents “a significant theoretical roadblock to certifying adversarial robustness,” the authors note. While still theoretical, this means that a real-life security risk arises every time companies outsource machine learning training—a common occurrence since the task takes a lot of computer power, data, and expertise. “It’s become very popular to use cloud services and startups to get more computational power and do the training, and the implication of our work is that this—trusting somebody else to train your network and then using it as you received it, or even trusting a server that is not in your physical possession to evaluate your computation—is inherently dangerous,” Zamir said.
Strategizing to prevent such backdoors
In short, the theoretical work by Zamir and his coauthors demonstrates the alarming reality that undetectable backdoors are essentially inevitable. This sense of inevitability has led them to investigate ways to prevent or neutralize such backdoors. “So, even if it is not possible to find backdoors, maybe companies that outsource machine learning can do something to change their network a little bit to get rid of them,” Zamir said.
Along with his coauthors, Zamir is currently trying to identify or devise protocols that would prove that machine learning models are not malicious. He likened one possible approach to using antibacterial gel to dispel coronavirus, saying “maybe it’s possible to do something protective, without detecting whether or not it’s needed, that would clean off the bad things if there were any.” Another protective approach, he posited, “would be to have the company that is training the machine learning network provide not only the network but also some proof that they didn't do anything suspicious.”
As with their work devising undetectable backdoors, these investigations may be advanced by using tools from cryptography. “We combined cryptography and machine learning to show that this backdoor example is possible, and now we’re also trying to combine them to mitigate that risk,” Zamir said, noting how, for example, cryptography tools are used to secure the internet.
Zamir, who met his coauthors during his time at the Institute for Advanced Study, plans to continue working with them on this question. The work ties into his broader focus of study, which uses algorithms, data structures, graph theory, and combinatorics to explore theoretical computer science. “There are some problems that are very hard for computers to solve. The main focus in my research is then about proving what computers can’t do and what are the reasons for that.”
With their paper showing that undetectable backdoors can be made possible, Zamir, Goldwasser, Kim, and Vaikuntanathan actually show how to construct backdoors that computers cannot detect.