Optimal Sports Betting
Code for this project can be found here
Sports betting can be generally thought of as consisting of two main tasks: coming up with probabilities for events happening, and then using those probabilities to determine how much you should bet. The first is undoubtedly the more complicated and will be left (for now) to the professionals at FiveThirtyEight.
Even if one can perfectly estimate the true odds of an event happening, the optimal bet amount is dependent on the odds offered. Naively, one might expect that the optimal strategy is to simply maximize expected wealth. However, this strategy would suggest to go all in on any winning bet — inevitably leading to bankruptcy over time. Instead, one ought to maximize the expected log of wealth, which is closely related to maximizing the median wealth [Kelly]. If we have a single event with probability p and odds o, the optimal bet under this criterion is a fraction of one’s wealth equal to (po-1)/(o-1). In practice, it is generally better to bet a fraction of this amount, sacrificing some expected growth for much less variance. One such case of this tradeoff is depicted in the plot, with o=2 and p=0.6
In the multiple event case, things become slightly more complicated. Following the work of [Nekrasov], I proceeded by using a Taylor series expansion and assuming independence, ultimately reaching the following equation, where q=1-p, d=o-1, and u is the wealth portion wagered.
While complicated, the key point is that when you set the gradient to 0, you end up with a set of n independent linear equations in n variables. From there, it is simple linear algebra to find the approximately optimal bets. Then, one multiplies these amounts by a chosen Kelly multiplier (I went with 1/2) to obtain the actual bet amounts.
As is common with taking simulated results to the real world, robustness is a key concern. Using MLB data from 06/05/21, I ran 10000 simulations to test the robustness. With a full Kelly bet, there is a margin of 1.6% before the median growth turns negative. Half Kelly increases this margin to 2.5% and quarter Kelly increases the margin to 2.8%. On the other side, if true odds are 2.5% better than estimated, the half Kelly strategy’s median growth doubles. In summary, don’t trust real money to this unless you are highly confident in your predictions.
Integration with the online data sources was a mostly straightforward affair. FiveThirtyEight’s predictions don’t have a readily available API, so I instead opted to scrape the information directly from their website with Python. Sportsbook odds come from the-odds-api.com. The biggest challenge was formatting the data so that both sources agree. I used a hash table to convert the long form names of the API to the abbreviations used by FiveThirtyEight.
I also needed to consider the time of each match due to the possibility of double headers. This was slightly problematic because the API would occasionally return a start time one minute later than actual (seemingly at random). My workaround for this was to only consider the starting hour, since a one minute delay would never change this (MLB loves staring at xx:05). For now the script is limited to MLB since adding other sports would require handling the different formatting that FiveThirtyEight uses across different sports.
After 37 gamedays of experimentation, the results are poor to say the least. A loss of $598.72 on $2810.25 - a 21% loss. This is why I used a spreadsheet instead of a sportsbook.
January 2022 Update
FiveThirtyEight’s predictions are much more reliable for longer duration events, such as league championships. The complication with this is that the possible outcomes to bet on are no longer independent - two teams can’t both win the same championship. While this precludes using the earlier optimization technique, mutually exclusive outcomes actually allow for more precise optimization. This is ultimately due to the fact that, for n events, the general case has 2^n possible outcomes while the mutually exclusive case has n possible outcomes, making an explicit objective function tractable. This function is
The gradient is also straightforward to compute analytically, though this is left out for the sake of brevity. There are also a few constraints, based on the idea that you can’t take the house’s position or leverage your bets.
After formulating the problem, it is straightforward to find the solution using SciPy’s optimization suite. The performance for long run scenarios is greatly improved when you allow for “cashing out” - when you close a bet early for slightly less money. It may be helpful to provide an example:
You bet $100 on Team A at 10.0 odds
Over time they outperform expectations, bringing the odds to 5.0
To have the same payout now, you’d need to bet $200
The sportsbook lets you cash out for, say, $180
They’re happy because it’s positive EV for them; you’re happy because it’s a guaranteed profit.
I used this process on the 2022 NBA championship starting about halfway through the season. I also used this process for the 2022 March Madness tournament.
For March Madness the final performance was $41.75 profit on $104 worth of bets - a 40% profit.
For NBA the final performance was $97.19 profit on $301.75 worth of bets - a 32% profit.
This went much better than the MLB experiment.