Alphabet’s New AI Defeats Human Players In A Multiplayer Game
Concepts easily understood by humans are not as simple to machines. Variables always arise and questions that machines can’t answer make complete autonomy difficult. Alphabet, Google’s parent company had its DeepMind’s technological studio train its AI to learn how to play a game of capture the flag on a level greater than that of a human.
One of the most basic games in terms of principle, capture the flag has two teams go against each other with the primary objective of the game being to capture a flag (any marker). The marker is located at each teams respective base and has to be captured by an enemy team after which they have to safely return with it to their own base safely. Easy for humans to understand and play, but complex for a machine that has to make numerous calculations that’ll help it strategize in a manner resembling humans.
This stands to change with AI and machine learning. A report published by researchers at DeepMind, subsidiary of Alphabet, details a system not only capable of learning the game, capture the flag but also devising strategies and planning on a level of human teams in Id Software’s Quake III Arena. The paper published reported on how the AI was not taught how the game played but only informed if the opponent was beaten or not.
Reasoning behind this approach of training an AI stems from the unpredictable behaviour that can be exhibited as the learning process continues. Some of the researchers working on DeepMind’s AI have previously developed alongside AlphaStar, who’s machine learning program beat professional StarCraft II players. The key techniques utilized in the study was reinforcement learning, which had rewards given out to incentivize the software towards the goal.
The agent utilized by DeepMind, appropriately dubbed For The Win (FTW), learns from on-screen pixels using a convolutional neural network, a collection of mathematical functions (essentially neurons in the human brain) arranged in layers. The data absorbed by this is sent to two recurrent long short-term memory (LSTM) networks, one that that operates on a slow timescale while the other on a faster timescale. This enables a degree of prediction about the game world and take actions through an emulated game controller.
The 30 FTW AI agents were trained in different stages with the powerful learning paradigm to improve real world performances. Agents there on were reported to have formulated and enacted strategies generalised across the different maps, team rosters, and team sizes. The AI’s learned human behaviours like following teammates, camping, and defending their base from attackers while not repeating tactics that do not give any inherent advantages like following the teammate too close as the training progressed.
The AI had surpassed the win-rate of human players by a substantial margin in a tournament involving 40 humans that were randomly matched in both games as teammates and enemies. The Elo rating (the probability on winning) of the AI was 1,600 compared to good human players that had an Elo of 1,300 while the average for human players was 1,050. This held true even when the agents were slowed down by a quarter of a second. Human players only won 12-21% of the times ranging from skill level.