Analysed Games (Database for Deep Learning)

@RichardRahl said in #10:
> @dboing my code is public under github.com/Philipp-Sc/learning and almost everything is allowed, except commercial use is prohibited. This app is very early and unfurnished, it is indeed only intended for my own need. I think that is the best way to make progress. Once my chess improves using it, I might promote it.

I have the same attitude, and freedom of usage of Github. Maybe not enough stamina or time to do anything though. But I sure can comment and share my thoughts...

Thanks for sharing the link. I hope I can interact with you there, if I see something worth both our focus.

But I would repeat, don't be in a hurry to add complexity, do the maximal characterization and alllow all your features blocks to interact functionnaly in different ways as building block.

count your parameters,. do simple experiements (even if they are both not best predictors of SF on test data, limits to that point though).

if you have a simple good one. try swapping instead of piling up. SF has been piling up a lot on its heuristic being human knowledge imbued, and is now forced to use tuning with only objective being engine X engine tournament in the stratosphere, which I heard have not change much at the constraint definition levels, some cultural inertia involved is my impression. Point is the human knowledge claim has been moot for a while now.

one could probably just integrate and contrast the resulting "mass" in SF classical eval design, a real valued function implemented by the SF heuristic static evaluation, over many diverse positions; not just those with near material imbalance, but also an equal amount of those where people would commonly agree that positional thinking (conscious or not, worded or not, but no near material imbalance computable by strict alternance). Standardize it. And then do the same over a partition with on one side all positions with high material imbalance, and the opposite on the other side (NNue territory by the way), one would find that it is very hard for any combination of positional feature to amount to a pawn difference.

So piling up without traceability and proper input-output characterization of each elemental building block, seems like an already tried dead end. Global optimization is great at input-ouput level. but not if you want null-hypothesis type of isolated feature answers (that requires statistical confidence in any one parameters, while all you get realy is confidence about the input-output quantity accuracy. It is true for NN by design, but also with any piling up... ask around. and tell me wrong. I would like somebody to tell me wrong (not opinions only, at least an elementary argument).

dboing

#12

>I was just thinking are there grandmaster databases with stockfish evaluations, I think there should be, but not easy to find, I guess.

The opening explorer uses a curated database (not fully open data, but maybe that changed) digested into a book DB by lichess, corresponding to its "Master" filter. It is referred to as the master DB, an slight slip, but persistent between Book and Database. There are quite a few threads here about how to get to the real database of games upstream of that.

But I would repeat, that you could not base your training on best games as they are only examples, not teaching material (by itself), they can be testing material though, for showing to the community. But not even as predictor testing (for tuning, which choosing feature is also about, you are playing with "hyper-parameters" in machine learning parlance (anything not under cross validation, is my thumb rule understand. They could be peppered in your database, but don't expect that using only those will give enough function support to NN fitting (The X has to be covering enough for the NN to find all the nooks and crannies in parameter space, which can be very small, even if quantized, in order to fit a complex function).

I hope I have given the equivalent of my inbox message to you. I welcome any counter arguments (there are many claims above), or clarification questions. I seek to be understood beyond my "domain". And I struggle. I would like to have to use zero jargon, like hyper-parameter. I just used it, for helping to tie to others sources, and verify what I try to paraphrase. But i don't want to rely on obscurity in order to explain.

dboing

edited

#13

Somewhere in there I talk about mass of static evaluation with a partition criterion. I may have to add, far from endgame or lots of nearby WDL, because of the completely unmotivated high values obtained from tuning their assigned weights to become commensurable to other types of valuation by the static SF evaluation function (which I claim to be dominated by material count).

One could argue that WDL assigned or tuned associated weights are themselves positional knowledge . But I would not. I want to compare what is heuristic. WDL, the legal terminal chess outcomes are converted states from any type of imbalance which I separated into standard counting for material transactions, and all the other elements not material or endgame case. The parameters assigned to WDL cases, are a class of itself, and my understanding is that they optimized as floating parameters to adjust to the other classes of valuation. Or we could make the experiment 3 prong.

Sorry if that looks tangential. But maybe you would have some similar decisions to make, since you are using SF as oracle.

jomega edited

#14

#1
I applaud your effort! I too would like to see chess programs that do a better job of coming up with the correct move (really plan) for a human to play, and a natural language explanation.

One of the big issues I see with using any chess playing program is that they too often do not even display as a possibility the move that a human should actually play! That is, the human correct move has been pruned away and does not show up in the output of multiple PVs.

Stockfish, for example, has multiple ways of pruning moves; including pruning that happens right after the legal moves in a position have been generated and before the loop on those moves continues! Reading the Stockfish code has to be done carefully. Many times the Stockfish functions have side affects; like that pruning. Other times a simple assignment statement (=) is actually calling a function because the = operator has been overloaded.

Therefore, if you are training your program on Stockfish annotated games, you will miss such cases even if you tell Stockfish to produce multiple PVs. A human annotated game, especially with non masters as the target audience, would most likely contain a comment explaining such a human correct move that a chess program dismisses.

I'll give two examples of the sort of thing I'm talking about.

- Unless a program can see a definitive result, or is explicitly programmed with the knowledge on what to do, they will not reduce a position to an objectively easily won position that scores worse than some other position. It does not matter for that if they use WDL or cp. What is lacking is a concept of 'easily won'. The developer's do not care about such a thing. Their program is not going to make a tactical slip and lose a won game, whereas a human might.
The human annotation on such a thing is of the form, "White reduces the position to an easily won endgame.", or "White gives up the exchange for a won position." There are many ways of saying this, and hence it will be a job to parse it out.

- Programs have no concept of playing a 'testing line'. This is especially noticeable when they produce a move for the losing side such as giving up their Queen immediately because the search has shown that giving it up later produces a worse WDL or cp. Every time I see this sort of thing, and it happens very often in the Lichess puzzles, I have to laugh. No human, not even a novice, plays like that. Instead, humans play the most 'testing line'; the variation that is going to require the opponent to play, possibly, the one and only winning move, over and over to secure the win.

This topic has been archived and can no longer be replied to.