November 23, 2017

AlphaGo Zero learns a complicated game without any human help

.... Once it learned the game, the program played against different versions of itself to help it learn from its mistakes and figure out what needs to win the game. AlphaGo Zero learned how to play the game by playing itself millions of times over. It learned the best method of winning the game through reinforcement learning – if it made a good move, it would be rewarded. If it made a bad move, it got closer to losing.

After playing roughly five million games against itself, the updated AI program could defeat human players and the original AlphaGo. After 40 days, it even reigned supreme over AlphaGo Master. The program depends on a group of software neurons that are connected together to form an artificial neural network. During each turn, the network examines the positions of the pieces on the Go board and determines the moves that might be made next and the chances of them leading to a win. The network updates itself after every game to make itself stronger for the next match.

Along with being more advanced, AlphaGo Zero is a simpler program. It was able to learn the game faster even though it trained on less data and runs on a smaller computer.


No comments:

Post a Comment

Share this...