White had more piece activity and was dominating the chessboard.

Domination: White 41.10% vs Black 6.85%

Lucas chess analysis shows it in other ways too.

I like the graph: Elo Average for each move.

So you see two lines.

Domination: White 41.10% vs Black 6.85%

Lucas chess analysis shows it in other ways too.

I like the graph: Elo Average for each move.

So you see two lines.

I don't think Lichess's graphs are centipawn, I think they are WDL too.

They are most likely using the same model they use to determine blunders and how the eval bar should be filled.

This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.

If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.

imgur.com/a/gUmscVD

They are most likely using the same model they use to determine blunders and how the eval bar should be filled.

This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.

If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.

imgur.com/a/gUmscVD

I'd argue the Ponomariov-Carlsen graph is quite good. It shows far better chances for black than white for some moves. And would likely show even more by setting calibration elo.

@RwSF75 said in #3:

> I don't think Lichess's graphs are centipawn, I think they are WDL too.

> They are most likely using the same model they use to determine blunders and how the eval bar should be filled.

> This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.

>

> If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.

> imgur.com/a/gUmscVD

+3 doesn't sound right. IIRC it was 1.x measured with some version of SF.

SF's WDL model also gives fairly nice graphs, but it has a few limitations that Leela's doesn't. It is sharper than Lc0's because of the higher strength of fishtest LTC games compared to leela training games, and has no configuration to change it. Also SF's WDL model has no direct way to distinguish between a dead draw 0.00 and a fighting 0.00 (other than ply in most versions, or material in the latest dev).

@RwSF75 said in #3:

> I don't think Lichess's graphs are centipawn, I think they are WDL too.

> They are most likely using the same model they use to determine blunders and how the eval bar should be filled.

> This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.

>

> If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.

> imgur.com/a/gUmscVD

+3 doesn't sound right. IIRC it was 1.x measured with some version of SF.

SF's WDL model also gives fairly nice graphs, but it has a few limitations that Leela's doesn't. It is sharper than Lc0's because of the higher strength of fishtest LTC games compared to leela training games, and has no configuration to change it. Also SF's WDL model has no direct way to distinguish between a dead draw 0.00 and a fighting 0.00 (other than ply in most versions, or material in the latest dev).

@Craftyawesome said in #4:

> +3 doesn't sound right. IIRC it was 1.x measured with some version of SF.

rawWinningChances(300) returns 50%

The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.

lichess.org/page/accuracy

github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11

github.com/lichess-org/lila/pull/11148

> +3 doesn't sound right. IIRC it was 1.x measured with some version of SF.

rawWinningChances(300) returns 50%

The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.

lichess.org/page/accuracy

github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11

github.com/lichess-org/lila/pull/11148

@RwSF75 said in #5:

> rawWinningChances(300) returns 50%

> The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.

>

> lichess.org/page/accuracy

> github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11

> github.com/lichess-org/lila/pull/11148

Huh, I guess you're right. Maybe I thought they were using internal units and not centipawns?

> rawWinningChances(300) returns 50%

> The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.

>

> lichess.org/page/accuracy

> github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11

> github.com/lichess-org/lila/pull/11148

Huh, I guess you're right. Maybe I thought they were using internal units and not centipawns?

I agree that in the limit of estimation processes involved in machine statistics about chess, the WDL has more information than estimates over specific engine tournament pools dominated by centipawn born technology.

In theory.

In theory.

How were the WDL graphs made? Nibbled?

As for lichess graphs, these aren't WDL

They're more like W + 50%D.

We have no idea about the draw%.

Let's say Lichess gives 50%, it could mean 50% white win 50% black win or 100% draw or anywhere in between.

They're more like W + 50%D.

We have no idea about the draw%.

Let's say Lichess gives 50%, it could mean 50% white win 50% black win or 100% draw or anywhere in between.

too late to edit. I am also hopeful that the centipawn born engines will scout behind for feature sets that can become interpretable.... while they try to fit to the engine tournament formats and usual computer cost constraints.