Establishing a Theoretical Understanding of Machine Learning

Institute: IAS     September 2019

Program at IAS aims to explain the why and how of algorithms with enormous power

“It’s kind of like physics in its formative stages—Newton asking what makes the apple fall down,” says Sanjeev Arora, Visiting Professor in the Institute for Advanced Study’s School of Mathematics, trying to explain the current scientific excitement about machine learning. “Thousands of years went by before science realized it was even a question worth asking. An analogous question in machine learning is ‘What makes a bunch of pixels a picture of a pedestrian?’ Machines are approaching human capabilities in such tasks, but we lack basic mathematical understanding of how and why they work.”

The core idea of machine learning, according to Arora, involves training a machine to search for patterns in data and improve from experience and interaction. Training involves algorithms, the theoretical foundations of which are of great interest in mathematics. “Machine learning is a very important branch of the theory of computation and computational complexity,” says Avi Wigderson, Herbert H. Maass Professor in the School of Mathematics, who heads the Theoretical Computer Science and Discrete Mathematics program. “It is something that needs to be understood and explained because it seems to have enormous power to do certain things—play games, recognize images, predict all sorts of behaviors. There is a really large array of things that these algorithms can do, and we don’t understand why or how. Machine learning definitely suits our general attempts at IAS to understand algorithms, and the power and limits of computational devices.”

Since 2017, Arora has been leading a three-year program in theoretical machine learning at the Institute for Advanced Study, supported by a $2 million grant from Eric and Wendy Schmidt. Arora’s theoretical machine learning group is specifically focused on fundamental principles related to how algorithms behave in machines, how they learn, and why they are able to make desired predictions and decisions. In 2019–20, fifteen to twenty Members will join the School’s special year program “Optimization, Statistics, and Theoretical Machine Learning” to develop new models, modes of analysis, and novel algorithms.

Why has machine learning become so pervasive in the past decade? According to Arora, this happened due to a symbiosis between three factors: data, hardware, and commercial reward. “Leading tech companies rely on such algorithms,” says Arora. “This creates a self-reinforcing phenomenon: good algorithms bring them users, which in turn yields more user data for improving their algorithms, and the resulting rise in profits further lets them invest in better researchers, algorithms, and hardware.”

With intense progress and momentum in the field coming from industry, the number of machine learning researchers who are trying to establish theoretical understanding is relatively small. But such study is essential—for reasons beyond its tantalizing connections to questions in mathematics and even physics. “Imagine if we didn’t have a theory of aviation and could not predict how airplanes would behave under new conditions,” says Nadav Cohen, current Member in the Institute’s theoretical machine learning program. “Soon you will be putting your life in the hands of an algorithm when you are sitting in a self-driving car or being treated in an operating room. We can’t yet fully understand or predict the properties of today’s machine learning algorithms.”

The most successful model of machine learning, known as deep learning, came to dominate the field in 2012, when neural networks, also called deep networks or more broadly referred to as deep learning models, were shown by a team of researchers in Toronto to dramatically outperform existing methods on image recognition. Since then, deep learning has led to rapid industry-driven advances in artificial intelligence, such as self-driving cars, translation systems, medical image analysis, and virtual assistants. When an artificially intelligent player developed by Google DeepMind beat Lee Sedol, an eighteen-time world champion, in the ancient Chinese strategy game Go in 2016, the machine utilized inventive strategies not foreseen or utilized by humans in the more than two millennia during which the game has been played. “Human intelligence and machine intelligences will very likely turn out to be very different,” says Arora, “kind of like how jet airplanes are very different from birds.”

“Deep” in this context refers to the fact that instead of going directly from the input to the output, several processing levels are involved until the output is achieved. Training such models involves algorithms known as gradient descent or back propagation, which enable the parameters to be tuned—think of a refraction eye exam to determine a prescription lens—in such a way that the output gets increasingly closer to the desired outcome. This model is inspired loosely by interconnected networks of neurons in the brain, although the brain’s exact workings are still unknown.

The sheer size of deep learning models, which outstrips human comprehension, raises important computational and statistical questions, as well as how to comprehend what the deep model is doing. The group at the IAS is focusing on such issues. The year-long special program in 2019–20 will focus on the mathematical underpinnings of artificial intelligence, including machine learning theory, optimization (convex and nonconvex), statistics, and graph theoretic algorithms, as well as neighboring fields such as big data algorithms, computer vision, natural language processing, neuroscience, and biology.

Given artificial intelligence’s resources, reach, and ability to exceed the performance of any individual, its societal implications are immense and unpredictable. As corporations and governments centralize enormous amounts of data that can be processed at very high speeds by algorithms that humans don’t fully comprehend, Wigderson points to the possible fault lines involved.

“There are issues of privacy and of fairness, because these networks will be used to predict, for example, if someone is likely to pay off their bank loan, or to commit another crime,” says Wigderson. “Machine learning is going to be applied everywhere, and we need to find out its limits, its fragility points. At this stage, deep nets is a phenomenon that we observe and experiment with (as scientists), but we don’t quite understand. It is a phenomenon in search of a theory, and this is a huge scientific and societal challenge.”