DeepMind’s new AI taps into games to improve fundamental algorithms
DeepMind has applied its mastery of games to a more serious matter: the fundamentals of computer science.
Google’s subsidiary today unveiled AlphaDev, an AI system that discovers new fundamental algorithms. According to DeepMind, the algorithms it has unearthed surpass those honed for decades by human experts.
The London-based lab has big ambitions for the project. As the demand for computation grows and silicon chips approach their limits, fundamental algorithms will have to become exponentially more efficient. By improving these processes, DeepMind aims to transform the infrastructure of the digital world.
The first objective in this mission is sorting algorithms, which are used to organize data. Under the covers of our devices, they determine everything from search rankings to movie recommendations.
To improve their performance, AlphaDev researched assembly instructions, which are used to create binary code for computers. After a thorough search, the system discovered a sorting algorithm that outperformed the previous benchmarks.
To find the winning combination, DeepMind had to revisit the feats that made it famous: winning board games.
Game the system
DeepMind has made a name for itself in games. In 2016, the company made headlines with its AI program defeated a world champion of Go, an insanely complicated Chinese board game.
After the win, DeepMind built a more general system, AlphaZero. Using a process called trial and error reinforcement learning, the program not only mastered Go, but also chess and shogi (also known as “Japanese chess”).
AlphaDev — the new algorithm builder — is based on AlphaZero. But gaming’s influence extends beyond the underlying model.
“We penalize it for making mistakes.
DeepMind formulated AlphaDev’s task as a single-player game. To win the game, the system had to build a new and improved sorting algorithm.
The system played its moves by selecting assembly instructions to add to the algorithm. To find the optimal instructions, the system had to examine a huge amount of instruction combinations. According to DeepMind, the number was comparable to the number of particles in the universe. And just one bad choice can invalidate the whole algorithm.
After each move, AlphaDev compared the algorithm’s output to the expected results. If the output was correct and the performance was efficient, the system got a “reward” – a signal that it was playing well.
“We penalize it for making mistakes, and we reward it for finding more and more of these sequences sorted correctly,” Daniel Mankowitz, the lead researcher, told TNW.
As you probably guessed, AlphaDev won the game. But the system not only found a correct and faster program. It also discovered new approaches to the task.

The new algorithms contained instruction sequences that stored a single instruction each time they were applied. They were called “swap and copy moves” and served as shortcuts to further algorithmic efficiencies.
DeepMind compares the approach to another moment in games: the legendary “move 37,” which an AI system played against Go champion Lee Sedol.
The strange move shocked human experts, who thought the machine had made a mistake. But they soon discovered that the program had a plan.
“In the end, not only did it win the game, but it also influenced the strategies that professional Go players started using,” said Mankowitz.
The win marked the first time AI had beaten a high-ranking Go professional — a milestone that pundits had predicted would take another decade.
Three years later, Lee retired from professional Go competition. He attributed the decision to the abilities of his AI rivals.
“Even if I become number one, there is an entity that cannot be beaten,” he said.
Select computers
AlphaDev’s sorting algorithms are now open-source in the main C++ library, where it is available to millions of developers and businesses. According to DeepMind, this is the first change to this part of the sorting library in more than a decade — and the first algorithm designed through learning to join the library.
After the sorting game, AlphaDev started playing around with hashing, which is used to retrieve, store, and compress data. The result was another improved algorithm, which is now released in the open-source Abseil library. DeepMind estimates that it is used trillions of times a day.
Finally, the lab introduces itself AlphaDev as a step towards the transformation of the entire computing ecosystem. And it all started playing board games.
Contents