Zero-like Reinforcement Learning, Monte Carlo Tree Search

visualisation
code
Published

January 2024

Zero-like reinforcement learning model for chess, similar to the DeepMind’s AlphaZero approach.

Practical walktrough to understand the basics of the zero-like reinforcment learning method and Monte Carlo Tree Search (MCTS). Which is possible then transfer to you own NP-hard combinatorial optimization problem (like finding the best topologies for industrial network devices).

High-level training loop of the zero-like reinforcement learning pipeline. Thank you Dominik Klein and his book β€œNeural Networks for Chess”.

Here is full implementation

Training without input data: in zero-sum self-play game two players play against each other for a bit. The game results compared with the model’s predictions, the network is updated, then self-play again with the updated predictions, then network updated, then self-play, … self-learning…

Hexapawn Β· MCTS Visualizer
AlphaZero-style search  Β·  100 simulations  Β·  model_it10.keras
Step 0 / 101
Board position
Root expansion
Legend
Selection path
Expansion (neural net)
Backup propagation
Unvisited node
High Q (White wins)
Low Q (Black wins)
Step
Speed
0 / 101