One can learn a lot about the decision-making process of robots propelled by artificial intelligence by observing them when they are left to their own devices. That’s what the OpenAI Artificial Intelligence lab research team did with its latest experiment, which involved simulating hundreds of millions of hide-and-seek games between two teams of robots.
The rules of the game were simple. A team of virtual agents had to hide while the other had to find them. The number of members of each team ranged randomly from one to three.
The avatars were placed in a closed arena where there were boxes and ramps that could be manipulated. They also had the ability to block access to these items to the other team so they could not move them.
The robots were powered by OpenAI Five, a computer system made up of five independent artificial neural networks that make autonomous decisions and learn from their mistakes in order to develop new behaviors.
OpenAI Five works with reinforcement learning, which rewards the algorithm when it adopts the desired behaviors. This is a technique used to train animals.
For the first 25 million games, the avatars that had to hide (the blue team) simply moved into space while the avatars who were to find them (the red team) chased them.
The blue team then realized that they could use objects in their environment to create impenetrable forts through the items they had at their disposal.
75 million games later, the red team developed a counter-strategy. She started using ramps to jump over obstacles.
The blue team fought back by hiding the ramps in its fort before blocking the entry with the blocks.
“When a team learns a new strategy, it creates pressure on their opponents to adapt. We can make an interesting analogy with the evolution of human beings on earth, where there was a constant competition between organisms,” said one of the members of the OpenAI research team, Bowen Baker, in interview with New Scientist.
Use bugs to win
Bowen Baker was surprised how artificial intelligence could adapt, no matter what the pitfalls.
When the simulation environment expanded to include more obstacles, the blue team had new, longer boxes at their disposal. The robots could then make more complex forts and more difficult to penetrate.
They also came to understand that they could block access to the ramps to the other team. OpenAI was certain that this marked the end of the experiment, but the red team discovered a bug that allowed it to cross barriers without ramp.
his strategy, dubbed box surfing by the research team, consists of moving on a box after climbing on it with a ramp that could not be manipulated.
According to OpenAI, this sequence of events indicates that artificial intelligence could have the capacity to propose unprecedented solutions to problems in the real world.
“We want people to imagine what would happen if we organized a competition of this kind in a much more complex environment. Learned behaviors could solve problems for which we do not yet have a solution,” Baker told MIT Technology Review.
The blue team ended up finding a foolproof strategy to win all parties: block access to all items, including boxes, before building his fort.