Mike’s Blog

Mar 13, 2023

Sorry about the late response! Right, if it's even odds then Partially-Psychic Pete have positive expected return on the bets while Just-A-Regular-Guy John would only be breaking even. But if we consider what types of bets they might be willing to take, Just-A-Regular-Guy John would probably be unwilling to take unfavorable odds in favor of tails (since he knows its 50/50), but Partially-Psychic Pete, since he's overestimating his own accuracy, could be talked into taking unfavorable odds to the point that he would have a negative expected return.

Expand full comment

Jonathan Mann

Feb 13, 2023

Great post Mike. Although I'm a fan of empirical track records, it seems that there are a lot of concerns that normal scoring rules don't capture. While it might not cover exactly the topic you touched on, I think you might enjoy this paper on Alignment Problems With Current Forecasting Platforms by Nuño Semperea and Alex Lawsen (https://arxiv.org/pdf/2106.11248.pdf). You might also enjoy my recent post about problems with forecasting tournaments (https://abstraction.substack.com/p/against-forecasting-tournaments).

Expand full comment

Feb 13, 2023

Thanks! Sounds cool, I'll check it out.

Expand full comment

Feb 19, 2023

Why do you prefer Brier scores to log odds? Or, perhaps a better phrasing: for what sorts of purposes do you think Brier scores are better, and for what sorts of purposes are log odds better?

Expand full comment

Feb 20, 2023

That's a good question and unfortunately I don't really have a good answer for it. I haven't put a lot of thought into log odds scoring and its relative advantages and disadvantages, but I'll look into it.

Expand full comment

There are obvious theoretical justifications for log scoring, and if you have large numbers of observations with a small amount of information each it's clearly the right thing to do, but I'm not sure if there are real-world reasons why you might prefer Brier score in some situations.

Expand full comment

What are the theoretical advantages of log scoring over Brier? I know log scoring punishes over-confidence more than Brier, but that seems like a subjective preference rather than an objective advantage.

One major disadvantage of log scoring in real-world scenarios is that if someone gives a 1 or 0 probability forecast and gets it wrong, then you get a log(0) error when you try to calculate it. Actually even if they get it right, it'll still give a divide-by-zero error.

Now, you might say it's fair for someone to be punished with a negative-infinity score if they give a 1 or 0 forecast, but the inability to compute the score seems like a major disadvantage in real-world scenarios where people might be badly calibrated or just bad at making forecasts. With Brier scores on the other hand, if you forecast a 1 or 0 and get it wrong, you simply get the maximum possible punishment for that question, and the calculation turns out fine.

Expand full comment

Reply (2)

In terms of punishing 0/1 calls, normally when I use log scores I'm trying to distinguish between two hypotheses. If something would 100% definitely be true under a given hypothesis, and I observe that it's false, I can reject that hypothesis, pack up and go home, so that score going to infinity is a feature not a bug.

For judging "how good is this person at predicting outcomes" in the real world, I guess things are less clear. Log scores punish overconfidence really heavily compared to hedging bets, and while I generally view that as a really valuable intellectual discipline I can believe that there might be applications where it wasn't what you wanted.

But I still view "if we imagine all our observations as one big observation, Alice scores higher than Bob iff she assigned a higher probability to that observation" as a really strong argument in favour of log scores over anything else for most purposes.

Expand full comment

Mar 13, 2023

Sorry about the late response! Thank you for the explanation, and yes those are some good points.

Expand full comment