I’ve been interested in prediction markets for a while, and have a lot of questions about how they work that I think could be answered through some pretty simple data analysis. Unfortunately, there aren’t a lot of publicly available datasets for sites like PredictIt, Manifold Markets, etc. I previously did a project where I

"I decided to correct this for my analysis, and normalize the implied probabilities so that they sum to 1"

This makes sense, but you need to be careful how you normalize for tail events. If a sportsbook's market on a huge underdog is 1%-3% (i.e. they offer the underdog at 3% and the favorite at 99%) then their fair price is much closer to 1% than 3% (for similar reasons to tails trading rich in prediction markets).

Also as far as some sports being harder to predict than others... I think about this much more in terms of win probabilities tending much closer to 50% - higher Brier scores are downstream and that framing feels less intuitive to me. Of course with a reasonable forecaster and a large sample size these converge, but the Brier measurement is quite a bit noisier for small samples.

Yeah your first point makes sense to me, but I don't really know of a way to consistently correct for it without introducing my own judgement, so for now I think the basic normalization of prob1/(prob1+prob2) is the best option to avoid my own judgement biasing things.

I'm not sure if I understand your second point. Do you mean that the season-wide win probability for each team is closer to 50% in the less predictable ones? That's also what I think it going on, but it seems like this is also downstream of the inherent predictability of the sport. Like in baseball or hockey, a bad team could just get lucky and score a winning goal or homerun over a much better team out of the blue. Like something about these sports is inherently more random in a way that you can get a 1 point lead due to luck and win the game because of it (my guess is that soccer is also relatively unpredictable for this reason).

But basketball and football it's a lot harder for a bad team to defeat a good team due to luck. And in college football this is still the case, plus the differences between the bad teams and the good teams is much larger than pro sports. Anyway I think the Brier scores are just an interesting quantitative confirmation of this, and to get an intuitive understanding it's better to think about the role of randomness in the games.

Your adjustment I think is the standard one people publish, but I've always disagreed with it - in my example it puts you closer to 3% instead of 1%. I prefer at least taking the midpoint: average(p1, 1-p2) if not guessing a formula that pushes the normalization closer to the tail side of the spread.

My 2nd point is basically what you're saying - that individual games have more randomness in the harder to predict sports. But I would come at identifying that feature from forecasted win probabilities (basically, how tightly clustered they are around 50%) rather than realized Brier scores. If you don't a priori know that MLB games have more even win probabilities, then the higher realized Brier scores could be indicative of either a poor model or the underlying distribution of the game probabilities (or luck, over a smaller sample). Perhaps "attempted Brier score" would be a good quick measure of how random a forecaster/model thinks a sport is? Tangentially, perhaps comparing realized Brier to attempted Brier could be a good summary statistic for calibration.

"I decided to correct this for my analysis, and normalize the implied probabilities so that they sum to 1"

This makes sense, but you need to be careful how you normalize for tail events. If a sportsbook's market on a huge underdog is 1%-3% (i.e. they offer the underdog at 3% and the favorite at 99%) then their fair price is much closer to 1% than 3% (for similar reasons to tails trading rich in prediction markets).

Also as far as some sports being harder to predict than others... I think about this much more in terms of win probabilities tending much closer to 50% - higher Brier scores are downstream and that framing feels less intuitive to me. Of course with a reasonable forecaster and a large sample size these converge, but the Brier measurement is quite a bit noisier for small samples.

Thanks for reading and for the feedback!

Yeah your first point makes sense to me, but I don't really know of a way to consistently correct for it without introducing my own judgement, so for now I think the basic normalization of prob1/(prob1+prob2) is the best option to avoid my own judgement biasing things.

I'm not sure if I understand your second point. Do you mean that the season-wide win probability for each team is closer to 50% in the less predictable ones? That's also what I think it going on, but it seems like this is also downstream of the inherent predictability of the sport. Like in baseball or hockey, a bad team could just get lucky and score a winning goal or homerun over a much better team out of the blue. Like something about these sports is inherently more random in a way that you can get a 1 point lead due to luck and win the game because of it (my guess is that soccer is also relatively unpredictable for this reason).

But basketball and football it's a lot harder for a bad team to defeat a good team due to luck. And in college football this is still the case, plus the differences between the bad teams and the good teams is much larger than pro sports. Anyway I think the Brier scores are just an interesting quantitative confirmation of this, and to get an intuitive understanding it's better to think about the role of randomness in the games.

Your adjustment I think is the standard one people publish, but I've always disagreed with it - in my example it puts you closer to 3% instead of 1%. I prefer at least taking the midpoint: average(p1, 1-p2) if not guessing a formula that pushes the normalization closer to the tail side of the spread.

My 2nd point is basically what you're saying - that individual games have more randomness in the harder to predict sports. But I would come at identifying that feature from forecasted win probabilities (basically, how tightly clustered they are around 50%) rather than realized Brier scores. If you don't a priori know that MLB games have more even win probabilities, then the higher realized Brier scores could be indicative of either a poor model or the underlying distribution of the game probabilities (or luck, over a smaller sample). Perhaps "attempted Brier score" would be a good quick measure of how random a forecaster/model thinks a sport is? Tangentially, perhaps comparing realized Brier to attempted Brier could be a good summary statistic for calibration.

Solid article!

Thanks Nathan!