DeepMind’s new AI taps into games to improve fundamental algorithms

DeepMind has applied its mastery of games to a more serious matter: the fundamentals of computer science.

Google’s subsidiary today unveiled AlphaDev, an AI system that discovers new fundamental algorithms. According to DeepMind, the algorithms it has unearthed surpass those honed for decades by human experts.

The London-based lab has big ambitions for the project. As the demand for computation grows and silicon chips approach their limits, fundamental algorithms will have to become exponentially more efficient. By improving these processes, DeepMind aims to transform the infrastructure of the digital world.

The first objective in this mission is sorting algorithms, which are used to organize data. Under the covers of our devices, they determine everything from search rankings to movie recommendations.

To improve their performance, AlphaDev researched assembly instructions, which are used to create binary code for computers. After a thorough search, the system discovered a sorting algorithm that outperformed the previous benchmarks.

To find the winning combination, DeepMind had to revisit the feats that made it famous: winning board games.

Game the system

DeepMind has made a name for itself in games. In 2016, the company made headlines with its AI program defeated a world champion of Go, an insanely complicated Chinese board game.

After the win, DeepMind built a more general system, AlphaZero. Using a process called trial and error reinforcement learning, the program not only mastered Go, but also chess and shogi (also known as “Japanese chess”).

AlphaDev — the new algorithm builder — is based on AlphaZero. But gaming’s influence extends beyond the underlying model.

“We penalize it for making mistakes.

DeepMind formulated AlphaDev’s task as a single-player game. To win the game, the system had to build a new and improved sorting algorithm.

The system played its moves by selecting assembly instructions to add to the algorithm. To find the optimal instructions, the system had to examine a huge amount of instruction combinations. According to DeepMind, the number was comparable to the number of particles in the universe. And just one bad choice can invalidate the whole algorithm.

After each move, AlphaDev compared the algorithm’s output to the expected results. If the output was correct and the performance was efficient, the system got a “reward” – a signal that it was playing well.

“We penalize it for making mistakes, and we reward it for finding more and more of these sequences sorted correctly,” Daniel Mankowitz, the lead researcher, told TNW.

As you probably guessed, AlphaDev won the game. But the system not only found a correct and faster program. It also discovered new approaches to the task.

AlphaDev