Monday, 20 January 2014

Betting using my model of Premier League football

I've been getting some questions over the past few weeks about the betting calls I make using my EPL model, so this post will explain how the betting choices work. If you just like to see who the model thinks will win each week then maybe skip this one, but if you're one of the people who's been looking at the calls and thinking, "What? He can't do that!" then this should help to explain my methodology. This is also going to be more than usually geeky so Wallpapering Fog felt like a better home for it than the EPL Index site.

If you're thinking "what EPL model?", have a look on eplindex.com.

First, a bit of history on where the model came from. That journey is how we got to here...

I've said before that this model wasn't originally built as a tool for betting and it's true. I first found eplindex.com last season (back when you could access all of their Opta stats for a few pounds a month), subscribed to the Stats Centre and built the model mainly to see if it would work. I had a vague thought that if it did work, then it could be interesting for a football club to use to forecast match results based on picking different players, but also assumed that the bigger clubs would already have sophisticated models of their own to do this type of work.

The model churned out a set of results for the first half of the 2012/13 season and I needed something to compare them with. Was my model any good? Bookmakers' odds are an obvious place to look for alternative results predictions, with easily accessed historical data available (football-data.co.uk if you're looking.)

That first version of the model didn't quite equal the bookmakers, in terms of the results that it said were most likely to happen, actually happening. The bookies favourites won games slightly more often than the model's predicted most likely outcomes.

Despite this, the model was projected by that analysis to make a small return if you used it to bet. The model didn't say the bookies favourites would win all of the time, so picked up some wins at decent odds. Bookmakers also almost never say that a draw is the most likely outcome of a game and if you backed a draw when the model said its likelihood was over 25%, you made a healthy return.

I started to predict results on Wallpapering Fog ahead of the games being played.

For betting, the rules were simple. Back a draw if the draw likelihood was over 25%, otherwise back whoever the model said was most likely to win. That's backing winners with no regard whatsoever to the market odds on that game. You could be backing a long shot that the model likes a lot, or backing a very short odds favourite that the model gives only a 40% chance of winning. For draws, the odds are usually around 3.5 but again, I was paying them no attention when picking the bets.

This method has periodically upset more seasoned gamblers, who point out that you shouldn't make picks like that. I do understand why not and I'll come back to it. Please bear with me.

The method arises as a result of having a primary objective for the model of calling as many results correctly as possible, rather than trying to maximise betting profits. This objective is also why I've never looked at the potential returns from using my model to call correct scores, or accumulators, or both teams to score.

It works like this:

1. Get as many results right as possible.

2. See if the strategy that achieves point 1, also makes money.

It did make a profit last season and is winning this season too, so that 'most likely outcome' method isn't as naive as it might look.

For any readers who aren't seasoned gamblers, the issue with backing the most likely outcome regardless of what odds the bookies are offering, is that you could be backing a result you think is a very close call, when the bookies are offering a only poor return if you're right.

If I flip a coin then you know the chance of it coming up heads is 50%. If I offer you odds of 1.5 on a bet on heads (£5 profit if you bet £10), you'd be mad to take it. You might win once, but in the long term, you're guaranteed to lose.

It's time to share some data... If you run the latest version of my model over the first 200 fixtures of the 2013/14 season, betting £10 on the predicted most likely result of each game, or on a draw if the predicted chances are over 27% (it's gone up a little from a 25% draw line since that first version) then here's what happens.

Important note: The data I'm using here to populate the simulation is the data that we had after week 20 had been played. I also know the exact starting line-ups for each of these games, which I won't when I post on a Friday ahead of a weekend's fixtures.

This is very much a best case performance. The model's good. But it's not quite this good.




So betting on the most likely result, regardless of market odd seems to work. Part of the reason for this is that we're imposing quite a harsh line before an upset is picked as a bet. In its raw results, the model predicts too many upsets, so rather than just saying it has to like the underdog more than the bookies do, we have a rule that it must like the underdog enough to actually return a prediction that they will win the game.

Very probably a better gambling strategy would be avoid to betting on certain fixtures at all, but we come back to my bullet points above; I'm forcing myself to give a prediction for every game. There is also very likely a better gambling strategy to be found in this model, but I like the simplicity of betting on the predicted winner. It works.

If you'd like to come up with your own strategy, I've put a link to all of the data behind the first 200 games of this season at the end of this post.

Let's have a look at what happens with an alternative strategy of backing 'value'. What happens when we bet on whichever of the three results (home win, away win, or draw) gives the biggest difference between the model's simulated likelihoods and the bookies odds? If the model's got an 'edge', then this should work.

The 'value' strategy's cumulative profit is in red below, with my usual method remaining in blue.


So the value strategy is also predicted to work, but returns are more volatile, as you'd expect since you're backing more long-odds results. Using the value strategy, you also win 38% of bets, rather than the 56% you're predicted to win by backing the most likely result. Both strategies should work (provided you don't mix-and-match between them) but the 'most likely result' is less risky in terms of long, bad runs.

To recap, the strategy I'm currently following arises from:

1. A self imposed rule that I must bet on every game and stake the same amount on every bet.

2. There is a benefit of moderating the model, so an upset must be predicted as being very likely, before we back it.

3. Evidence (the above, plus last season and this season so far) that backing the most likely predicted result is effective.

If you'd like to dive into the data, see where these numbers come from and pick your own strategy based on the EPL Model's calls, it's all here.

4 comments:

Tom Parke said...

I have been following your blog and predictions having first seen it over christmas. I too have football prediction model (Bayesian top down) but it was not beating the bookies so imagine my envy at your performance!
So I have downloaded your betting history and run it through a simple R routine to look for the optimal betting strategy. Just looking at your probabilities I think if you bet on every home win where your model probability is > 0.425, every draw > 0.242 and every away win > 0.344. Betting £10 a time, you would be £792 to the good as at the end of week 20.
Betting on the basis of the difference between your model and the odds ends up £820 better off (but you should only bet on a draw when the difference is -ve!). This is so close as to be no clear difference. Betting simply based on the odds themselves I can get a profit of £380 (which makes me wonder if the odds are being recoded correctly by football-data.co.uk). I have also looked at whether you are better off betting as you do with a fixed stake or a stake set to give the same pay off (which is what I thought would be the right thing to do if betting off a probability model). The fixed stake definately wins, which encourages me to look at my model again as I was using a 'fixed win' model and maybe that's what I was doing wrong. I'm not sure why this shoudl be the case though.

Neil Charles said...

Hi Tom,

Thanks for the detailed comment! Sorry it's taken me so long to reply.

Interesting that you pick up the impact of dropping the draw line to c. 24% and that fits with my results but I'm not happy to do it yet... By the odds on football-data.co.uk you'd win (not loads, but you'd win) by backing every game to be a draw. I'm trying not to lean my predictions too heavily towards what looks like a market inefficiency - I want to get the result right first and win money second, so have set a slightly higher bar at 27%.

I definitely want to do more of what you've investigated for next season, in terms of not betting a flat stake on every game. There's definitely money to be made here! I just haven't attempted to grab it yet.

Norman Kremer said...

Hey Neil,

I've just recently stumbled across your blog. It's a shame you don't update your model anymore on EPL index. - I know this is an old post.

I'm also attempting to write a football prediction model. I'm struggling to find any good data.

In the post you mention that you get your data on EPL. Do they still offer this? (Can't seem to find it) - Do you know any other sources where I could find data?

Kind Regards

Neil Charles said...

EPL Index don't do stats anymore unfortunately. Opta's licencing deals changed and they had to stop.

You can still get numbers on a few sites, but not in quite such easy to extract tables as EPL Index had. Try Squawka and Four Four Two Statszone.

I am still working on all of this! But in private for the minute. Hopefully will have something public again at some point but it won't be this season.