Zero-like Reinforcement Learning, Monte Carlo Tree Search

Zero-like reinforcement learning model for chess, similar to the DeepMind’s AlphaZero approach.

Practical walktrough to understand the basics of the zero-like reinforcment learning method and Monte Carlo Tree Search (MCTS). Which is possible then transfer to you own NP-hard combinatorial optimization problem (like finding the best topologies for industrial network devices).

High-level training loop of the zero-like reinforcement learning pipeline. Thank you Dominik Klein and his book “Neural Networks for Chess”.

Here is full implementation

Training without input data: in zero-sum self-play game two players play against each other for a bit. The game results compared with the model’s predictions, the network is updated, then self-play again with the updated predictions, then network updated, then self-play, … self-learning…

Hexapawn · MCTS Visualizer

AlphaZero-style search · 100 simulations · model_it10.keras

Step 0 / 101

Board position

Root expansion

Legend

Selection path

Expansion (neural net)

Backup propagation

Unvisited node

High Q (White wins)

Low Q (Black wins)

Step

Speed

0 / 101