4 Comments

Can you make plots of how well calibrated the market is vs how long until it resolves? I.e. if the market becomes well calibrated only in the last 3 days prior to the match and previously it was only amateur creating an uncalibrated market?

Expand full comment
author

That's a good idea! Unfortunately this sports betting dataset only has 1 timepoint, but maybe I'll try to look for another one that has the odds at many timepoints.

Expand full comment

This is a great analysis Mike! My intuition about whether to use Brier vs log scoring for evaluating prediction market forecasts is that it's not that important since they're both proper scoring rules which mean that, in both cases, the error minimizing strategy is to always attempt to forecast the ground truth probability. However if someone is considering gambling with their life savings (or anything where the cost for overconfidence is much higher than the cost for underconfidence), then it makes sense to use log scoring.

Expand full comment
author

Thanks Jonathan! That's a good point, although the thing I don't like about log-scoring is that you can get negative infinities. If someone gives 100% certainty of something and gets it wrong, maybe scoring them at negative infinity makes sense from a philosophical standpoint, but I view it as unworkable and a disadvantage when it comes to doing comparisons, scoring participants in tournaments.

I think it's especially a problem when comparing performance on many questions. For example, let's say two people are competing in a forecasting contest. The first person gives 100% confidence on one question, and gets it wrong. The second person gives 100% confidence on 2 questions, and gets them both wrong. Brier scores (and common sense) say that the first person did better, at least one those 2 questions. But log-odds will score them both at negative infinity.

Btw I really enjoyed your recent post on different scoring methods!

Expand full comment