Sunday, 28 January 2018

AlphaZero, Stockfish, and flexibility regarding chess variants.

Recently there was a high-profile set of matches between reigning champion chess AI stockfish and a newcomer called AlphaZero. AlphaZero was created with the same deep learning System that created AlphaGo, an AI that beat the world's best at the game Go. In the 50 matches that AlphaZero played as white, it won 24 of them and drew on the other 26. In the 50 games that it played as black it won three of them and drew on the other 47.

This advantage towards the white player may seem startling, however it's not out of line with other matches between artificial intelligence programs at the world-class level, nor is it out of line between matches between world-class human players. Stockfish, which evaluates positions in a chess game in terms of pawns of advantage starts the game with an advantage towards white of 0.1 pawns. AlphaZero, on the other hand, has no idea how many pawns of advantage white has, because it looks at the game holistically which is a radically different method of analysis compared to other modern AIs.

Part of the reason I bring up issues of artificial intelligence is to look at how well the these various systems will carry over to different chess variants rather than just the orthrodox version of the game.

Let's start with Stockfish: Stockfish is a system very much like Deep Blue and many of the other ones that came between it and Stockfish. The difference being that Stockfish is open source, meaning anyone can examine the code and edit it. This and many of the artificial intelligence programs that came before it run on a minimax principle, meaning that they try to choose the move on the assumption that their opponent will choose the best counter move in response to it, thus they try to pick the move which has the worst best counter-solution. (They try to minimize their opponent's maximum move quality.

To simplify, consider this abstract game. You have two options: Option A allows your opponent to score 5 points. Option B allows your opponent to choose between a move that scores 6 points, and a move that scores 2 points. Assuming that your opponent will choose their best move, your best choice is to select Option A, because it limits their score to 5. The fact that Option B provides the possibility for your opponent to score only 2 points is irrelevant. This is the minimax principle.

Most of what a traditional chess AI does when selecting a move is to evaluate a particular position is worth in terms of some abstract score, such as 'number of pawns'. The value of pieces is straightfoward: a pawn is worth approximately 1, a bishop or knight is worth about 3, and a queen is worth roughly 9. However the position of these pieces also matters. Having a piece in the middle or able to reach the middle at any point is worth a premium. A 'passed pawn', or one which has no opposing pawn directly ahead of it, is worth more than it would otherwise be, because of its greater potential to be promoted. The alpha-beta algorithm (not related to AlphaZero) contains set of parameters which decide how much each board piece is worth on each square. Different machine learning methods such as neural networks can be used to determine what these parameters should be.

For variants of chess that are very close to the original game such as Chess 960 (a.k.a Fischer Random Chess) or Really Bad Chess, which both feature 8 by 8 grids, 16 pieces per side, and only the orthodox six pieces, an AI using the alpha-beta algorithm should be able to play such games with few if any complications.

These AIs work even after pieces have been removed from the game so variants that use fewer pieces don't produce any difficulties either. In practice, variants with different board sizes are different arrangements such as Martin Gardener's mini chess or Romanchenko Chess (shown in the figure, source: Jocly) work well too as long as the value of any squares beyond the board are hard-coded to zero. This also means in practice that an alpha-beta algorithm can produce a viable chess AI on a board that is not a perfect square or rectangle. However it can increase the computational load the non-viable squares are considered, as they are in Jocly's implementation of alpha-beta on Romanchenko Chess.

Some systems, including Deep Blue, take advantage of chess literature, specifically for the orthodox game also take advantage of openings and their reputations methods for winning particular and games such as when you have a rook and a bishop against an opponent who just has a rook. However, after the opening and before the end game it's pretty much alpha-beta all the way. [1]

Variants that included new pieces such as fairy chess, or non-linear board movement such as Smess, the Ninny's Chess, can also be supported by AI programs that uses the alpha-beta algorithm. However these programs will need additional manual training to be able to evaluate the value of different pieces and space.

AlphaZero works on an entirely different principle; it does not assume that its opponent is the best possible opponent, one which will make the best possible counter move. Instead, AlphaZero evaluates a candidate position by simulating games of weighted random moves starting from the position to be evaluated. The evaluation is simply the proportion those random-move games that win* from AlphaZero's side. It evaluates the position this way for each move that it could make, and simply chooses the move that results in the best win proportion.

In these simulation games that AlphaZero uses, the weighting of the moves is based on moves that are likely to lead to a win based on games that AlphaZero played against itself. For example, AlphaZero may assign more weight towards a move that takes a piece over one that doesn't. It may also assign greater weight towards moves that give it control of the centre of the board. But these weight assignments would not be the result of any human supervision.

Similarly, AlphaZero has no concept of chess theory such as openings or their refutations, and it doesn't have a book of endgames to rely upon. AlphaZero was trained simply by giving the system the rules of chess, and letting it play many games against different versions of itself. It's reasonable to assume from here that AlphaZero would be able to handle many chess variants without any additional modifications other than informing it of the new rules. Furthermore a very similar training system could be given to nearly any chess variant to produce an AI program that could play that particular game.

* More exactly, the evaluation is (Proportion of Wins) + 1/2*(Proportion of Ties)

[1] Beyond Deep Blue: Chess in the Stratosphere, Monty Newborn