Imagine two people – I’ll call them Partially-Psychic Pete and Just-A-Regular-Guy John – are playing a game where they take turns attempting to make predictions about a fair coin toss by forecasting the probability that it will come up heads.
I made a related comment on your post about 538 calibration which is basically: who would the superior gambler be? Partially-Psychic Pete or Just-A-Regular-Guy John?
At fair odds, presumably Pete...
But it gets interesting if the odds aren't even, right? Let's say it's like 1:99 odds tails and 99:1 odds heads (stupid bookie). John always takes the big payout and 50 times out of 100 wins. That probably beats Pete (who 40% of the time incorrectly takes the tiny payout). Right?
My math isn't strong enough here but I'm curious where the line is in terms of odds unevenness before John is favoured. My intuition says it's tightly related to Pete's overconfidence... I think I need to try some simulations or write this down properly...
Great post Mike. Although I'm a fan of empirical track records, it seems that there are a lot of concerns that normal scoring rules don't capture. While it might not cover exactly the topic you touched on, I think you might enjoy this paper on Alignment Problems With Current Forecasting Platforms by Nuño Semperea and Alex Lawsen (https://arxiv.org/pdf/2106.11248.pdf). You might also enjoy my recent post about problems with forecasting tournaments (https://abstraction.substack.com/p/against-forecasting-tournaments).
Why do you prefer Brier scores to log odds? Or, perhaps a better phrasing: for what sorts of purposes do you think Brier scores are better, and for what sorts of purposes are log odds better?