Zero-like reinforcement learning model for chess, similar to the DeepMindβs AlphaZero approach.
Practical walktrough to understand the basics of the zero-like reinforcment learning method and Monte Carlo Tree Search (MCTS). Which is possible then transfer to you own NP-hard combinatorial optimization problem (like finding the best topologies for industrial network devices).

High-level training loop of the zero-like reinforcement learning pipeline. Thank you Dominik Klein and his book βNeural Networks for Chessβ.
Training without input data: in zero-sum self-play game two players play against each other for a bit. The game results compared with the modelβs predictions, the network is updated, then self-play again with the updated predictions, then network updated, then self-play, β¦ self-learningβ¦