Anastomochess

De Lillois Fractale Wiki
Révision datée du 16 décembre 2017 à 13:14 par Pge (discussion | contributions)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à la navigation Aller à la recherche

Introduction

This article describes an ambitious project related to the chess game.

The purpose of the project is not a chess playing engine; it is a chess database and a front-end program to explore it (with a dedicated web site).

Other articles on chess from the same author are available in this wiki.

Building the database

The building of the database should include the loading of MANY games (~106 and more if possible).

Games played by good players are preferred.

The loaded games should include (besides the sequence of moves), the result and the name and elo rating of the players.

Each game will be processed so that the 20 (or more...) first positions generated by the moves are stored in the database.

Note: A PGN file typically takes less than 1Kbyte. Multiple compressed PGN files (in a zip set) would take around 0.2 Kbyte per game.

Anastomosis

In a tree (in computing science and in life science), the anastomosis is the reconnection of two streams that previously branched out.

In chess, which is tree game, anastomosis is highly present, but most of the time poorly handled.

Anastomosis brings complexity, because it eliminates several convenient properties of tree management. But it also provides wide benefits.

Indexing positions

In order to manage a large number of positions, a good indexing scheme is required. An original and performing scheme is documented here, and provides a way to index any position on a compact bit set.

The number of bits fully defining a position is between 78 and 113, depending on the complexity of the position, but this is optimal in terms of compacity (would be hard to have it more compact).

The position indexing scheme is used to build huge HashSets of positions.

Database content

For each position in the database, the following fields at least should be present:

  • the number of paths leading to that position (anastomosis factor)
  • the various last moves leading to that position (and links to the corresponding previous positions)
  • the various next moves played from that position, and for each of them
    • the number of times they were played
    • the average Elo rating of the players who chose them
    • the results of the games (white/draw/black)
  • an uncertainty measure for this position, using an entropic measure
  • a global white/black balance measure (immediate value)
  • the minimax value of the recursive immediate value.
  • a global dynamic measure (measure of likelihood of a draw game)
  • (the indexing key)

The size of the information stored by position should be around 250 bytes (including linked objects)

An efficient use of storage would then allow to manage up to 4x106 anastomo-positions per terabyte.

Specific data structures

The specific data structures involved in the project include

  • back moves (5 bytes + 1 key per object)
  • forward moves (20 bytes + 1 key per object)
  • anastomo-positions (~80 bytes + 1 key per object)

[ a key is at most 15 bytes ]

User interface

The user interface should

  • display all the information present in the database (listed above)
  • allow backward links to previous positions for all last moves
  • allow forward links to next positions for all next moves

Obviously the user interface should be fast, user friendly, visually attractive.

The relationship between the data server and the front-end is probably complex and should inlcude an aefficient usage of caching tools.

It is beyond the perimeter of this article.

Development

Suggested technology

Java / GWT / Collections.

The java libraries of PG would provide fast development path.

Estimated workload

Between 0.75 and 2.5 year men, depending on programmer efficiency, front-end architecture choices,...

Prototyping

A small anastomotic database (with 50K positions) would be relevant to develop the anastomochess.

Programs

Downloader.  This program collects a large numbre of PGN files (up to ~106).Most of the work may be achieved with some simple bash script commands (Scripts/icofy)

GameCompress. This program transforms a large number of PGN files into a unique massive file. The games selected have to

  • include at least 25 moves for both sides
  • have been played by 2 players higher than 2000 elo

The size of the resulting file is around 80 Mbytes per million of games.

PosLoader. Builds and fills the anastomotic tree, loading the compressed game files. Requires significant memory. THIS is the key program!

FrontEnd. A JavaFX animated web aplication, dynamiccaly displaying the positions and moves.

Objects

All objects are subclasses of fixBitSet.

Since fixBitSet is a subclass of BitSet, and since BitSet are backed, by arrays of Long objects, the actual memory consumption of these objects will always be mutiple of 64 bits (that's bad).

PosKey. The position identification, using at most 113 bits, and in average 100 bits.

Game. Compressed version of a game. Contains 2x25x4x3 bits (2 sides - 25 moves - 4 data per move, 2 rows and 2 cols - 3 bits for row or col identification) for the game, plus the result, and the ranking of both players. Roughly 620 bits per game, thus 80 bytes.

BackMove. Contains the key for the previous position, plus 12 bits for the back move. 112 bits in average.

ForwMove. Contains the key for the next position, plus 12 bits for the forward move, plus various technical attributes (around 5 integers, using Hufint compression). Globally ~ 160 bits in average.These objects will consume most of the memory when the PosLoader program runs.

Position. A posKey, plus a set of BackMoves, a set of ForwMove, and various technical attributes (~6 float values) . The size of a position object strongly depends on the forking density in the global tree. If 5 BackMove and 5 ForwMove are assumed as average values, then the average Position size should be around 1650 bits (100+5x112+5x160+6x32). A conservative estimation is then 250 bytes.

Performance issues

Sizing global capacity

Assuming that the main program (PosLoader) takes advantage of 4 Gb RAM, it should be able to manipulate  16x106 positions. With 1 million (106) loaded games, 2x25 moves per game and a average repetition factor of 3 (questionable), the number of positions would be  2x25x106 / 3 , which is indeed  ~ 16x106.

Computing time

The computing time will be a surprise. The PosLoader program will mostly work in pure RAM, but anyway the processing of 106 games and building of 16x106 positions might take some time.