DeepMind today announced a new milestone for its artificial intelligence agents trained to play the Blizzard Entertainment game StarCraft II. The Google-owned AI lab’s more sophisticated software, still called AlphaStar, is now grandmaster level in the real-time strategy game, capable of besting 99.8 percent of all human players in competition. The findings are to be published in a research paper in the scientific journal Nature.
Not only that, but DeepMind says it also evened the playing field when testing the new and improved AlphaStar against human opponents who opted into online competitions this past summer. For one, it trained AlphaStar to use all three of the game’s playable races, adding to the complexity of the game at the upper echelons of pro play. It also limited AlphaStar to only viewing the portion of the map a human would see and restricted the number of mouse clicks it could register to 22 non-duplicated actions every five seconds of play, to align it with standard human movement.
Still, the AI was capable of achieving grandmaster level, the highest possible online competitive ranking, and marks the first ever system to do so in StarCraft II. DeepMind sees the advancement as more proof that general-purpose reinforcement learning, which is the machine learning technique underpinning the training of AlphaStar, may one day be used to train self-learning robots, self-driving cars, and create more advanced image and object recognition systems.
"The history of progress in artificial intelligence has been marked by milestone achievements in games. Ever since computers cracked Go, chess and poker, StarCraft has emerged by consensus as the next grand challenge, ”said David Silver, a DeepMind principle research scientist on the AlphaStar team, in a statement. "The game’s complexity is much greater than chess, because players control hundreds of units; more complex than Go, because there are 10 ^ 26 possible choices for every move; and players have less information about their opponents than in poker. "
Back in January, DeepMind announced that its AlphaStar system was able to best top pro players 10 matches in a row during a prerecorded session, but it lost to pro player Grzegorz "MaNa" Komincz in a final match streamed live online. The company kept improving the system between January and June, when it said it would start accepting invites to play the best human players from around the world. The ensuing matches took place in July and August, DeepMind says.
The results were stunning: AlphaStar had become among the most sophisticated Starcraft II players on the planet, but remarkably still not quite superhuman. There are roughly 0.2 percent of players capable of defeating it, but it is largely considered only a matter of time before the system improves enough to crush any human opponent.
This research milestone closely aligns with a similar one from San Francisco-based AI research company OpenAI, which has been training AI agents using reinforcement learning to play the sophisticated five-on-five multiplayer game Dota 2. Back in April, the most sophisticated version of the OpenAI Five software, as it’s called, bested the world champion Dota 2 team after only narrowly losing to two less capable e-sports teams the previous summer. The leap in OpenAI Five’s capabilities mirrors that of AlphaStar’s, and both are strong examples of how this approach to AI can produce unprecedented levels of game-playing ability.
Similar to OpenAI’s Dota 2 bots and other game-playing agents, the goal with this type of AI research is not just to crush humans in various games just to prove it can be done. Instead, it's to prove that – with enough time, effort, and resources – sophisticated AI software can best humans at virtually any competitive cognitive challenge, be it a board game or a modern video game. It's also to show the benefits of reinforcement learning, a special brand of machine learning that’s seen massive success in the last few years when combined with huge amounts of computing power and training methods like virtual simulation.
Like OpenAI, DeepMind trains its AI agents against versions of themselves and at an accelerated pace, so that the agents can clock hundreds of years of play time in the span of a few months. That has allowed this type of software to stand on equal footing with some of the most talented human players of Go and, now, much more sophisticated games like Starcraft and dota.
Yet the software is still restricted to the narrow discipline it's designed to tackle. The Go-playing agent cannot play dota, and vice versa. (DeepMind did let a more general-purpose version of its Go-playing agent try its hand in chess, which it mastered in a matter of eight hours.) That's because the software isn't programmed with easy-to-replace rule sets or directions. Instead, DeepMind and other research institutions use reinforcement learning to let the agents figure out how to play on their own, which is why the software often develops novel and wildly unpredictable play styles that have since been adopted by top human players.
"AlphaStar is an intriguing and unorthodox player – one with the reflexes and speed of the best pros but strategies and a style that are entirely its own. The way AlphaStar was trained, with agents competing against each other in a league, has resulted in gameplay that’s unimaginably unusual; it really makes you question how much of StarCraft’s diverse possibilities pro players have really explored, "Diego" Kelazhur "Schwimer, a pro player for team Panda Global, said in a statement. "Though some of AlphaStar’s strategies may at first seem strange, I can’t help but wonder if combining all the different play styles it demonstrated could actually be the best way to play the game."
DeepMind hopes advances in reinforcement learning achieved by its lab and fellow AI researchers may be more widely applicable at some point in the future. The most likely real-world application for such software is robotics, where the same techniques can properly train AI agents how to perform real-world tasks, like the operation of robotic hands, in virtual simulation. Then, after simulating years upon years of motor control, the AI can take the reins of a physical robotic arms, and maybe one day even control full-body robots. But DeepMind also sees increasingly more sophisticated – and therefore safer – self-driving cars as another venue for its specific approach to machine learning.
Correction: A previous version of this article stated DeepMind restricted AlphaStar to 20 actions every five minutes. That is incorrect; the restriction was to 22 non-duplicated actions every five seconds. We regret the error.