Brace yourself for imminent defeats on Lichess against Transformers

In a recent study, large transformer models trained on the ChessBench dataset of 10 million human chess games demonstrated a notable level of chess play, achieving an Elo rating close to that of grandmasters. However, these models tend to underperform compared to certain biologically-inspired neural architectures, particularly when it comes to tactical precision.

The ChessBench dataset, containing annotated games by Stockfish, was used to train the transformer models in a supervised learning approach. Each game position was labeled with state-values, indicating expected outcomes, and action-values, representing recommended moves. During training, board states were also labeled as "bins" representing different levels of confidence or likelihood for moves.

Three training targets—action-values, state-values, and behavioral cloning—were tested to determine the best approach for supporting chess planning. The transformer models were tested for Action-Value Prediction, State-Value Prediction, and Behavioral Cloning.

Despite their impressive performance, the transformer models struggled when speed and tactical precision were critical. They couldn't fully replicate Stockfish's search-based approach, falling short of engines like Stockfish when making quick, tactical moves.

Interestingly, the transformers' performance drops significantly when playing non-standard chess games, such as Fischer Random Chess, indicating generalization to similar but non-identical scenarios remains a challenge.

While transformer models can mimic human moves to a degree, their ability to generalize and perform complex, strategic reasoning without search heuristics remains limited compared to biologically-inspired architectures like GNN-BPU and CNN-BPU models.

For instance, a lightweight GNN-BPU model trained on only 10,000 ChessBench games achieved about 60% move accuracy, which is nearly ten times better than any size transformer model trained on the same data. Similarly, CNN-BPU models with around 2 million parameters outperformed parameter-matched transformers, and when combined with a depth-6 minimax search during inference, they could reach up to 91.7% move accuracy—even exceeding a 9 million parameter transformer baseline trained on the same dataset.

These findings suggest that large transformers, trained solely on the ChessBench dataset without explicit search (like minimax) or memorization tactics, struggle to attain very high-level chess play close to grandmaster standards. While they can learn patterns from a large dataset of human games, their strategic reasoning skills are not as refined as those of biologically-inspired architectures.

The implications of this study could streamline AI development in strategic decision-making and extend beyond games to real-world planning applications. As the technology matures, we might see transformers applied in various complex, real-world scenarios, from logistics to robotics, where generalization and adaptability are crucial.

In conclusion, while transformer models show promise in chess play, they are outperformed by biologically-inspired neural network models both in accuracy and efficiency. Close to grandmaster-level performance currently requires either explicit search methods combined with learned representations or alternative neural architectures beyond vanilla transformers.

The study also revealed that the transformer-based models performed comparably to AlphaZero and Stockfish without using search during play, and they could handle novel board positions, indicating that they learned strategies rather than relying on memorized moves. These findings open up exciting possibilities for the future of AI in games and beyond.

Artificial intelligence, specifically transformer models, have demonstrated remarkable chess play abilities but struggle with tactical precision and speed, thereby falling behind biologically-inspired neural architectures such as GNN-BPU and CNN-BPU models.

Furthermore, when it comes to real-world planning applications beyond games, the performance of transformer models could be improved by combining explicit search methods with learned representations or by utilizing alternative neural architectures.