Monday, 8 April 2013

Football model: Under the hood

I was writing a proposal for a client last week and remembering how important it can be to show some of the mechanics underneath your answers. Not to explain everything that's going on, but to share some screenshots and explanations as evidence that you aren't just making all this stuff up as you go along. After all, you could drive your car quite happily without ever seeing what makes it go, but it's definitely reassuring to see a big shiny engine when you pop the bonnet open.

All most people have seen of the football model so far, is some percentages that get spat out at the far end. My first post about football simulation explained the basics of the model, but what am I actually up to, that makes these numbers happen?

Who else gets excited by screenshots of spreadsheets? Just me then? Ah well, here they are anyway.

Step one, is a list of the weekend's games and predicted starting line-ups. I get these from Fantasy Football Scout and week to week, input team changes by hand. I'd really like to automate this bit because it's a pain - it doesn't take ages to set up, but being manual means if I'm not at a computer, the model can't run itself.

This list of fixtures and players gets read in by the simulator (Visual Basic for the moment, if you're wondering) so that it can simulate virtual games.

Next, we need stats for each player, so that the simulator knows how each of them performs in real life. These stats come from EPL Index and give us a database describing each player's decision making, successes and failures in real games so far this season.

For each team, that looks something like this one for Southampton.

Yeah, I've missed out the column headers. Sorry about that, but this is turning into something I've invested a lot of time in! You can probably work some of them out if you're determined to...

These stats get pulled into the simulator and then it's ready to run a virtual game.

Or actually, to run 1000 virtual games and tell us what the result was in each of them, so we can find the percentage chance of either team winning, or of a draw.

Who wants to see what a footballer looks like inside The Matrix?

Probably prettier in real life. He's got a good engine though.

And now we're ready to run the weekend games. I press go and this happens.

If it's just running one weekend's games then I'll read Twitter for a bit. If it's simulating a whole season then the laptop gets some alone time and we come back later. It's playing through each fixture 1000 times, with around 800-1000 events per game.

At the top end, that's a million events to get one simulated result. 380 games in a season and so when I do a large run to assess whether changes to the model have improved its performance, we're simulating 380m individual events. Definitely gives me time to fit in a cup of tea.

And finally, out come the percentages that I've been posting for the past couple of months.

So now you know, it really is a proper model. One that I've spent far too long building.

And it works...


Goalimpact said...

The odds look very realistic. No easy task in a bottom-up model Good job, you must have spent nights tweaking the parameters.

Neil C said...

Few nights' work in there yes! It's actually been pretty reasonable from the start in terms of not outputting crazy percentages too often.

Though it did give Newcastle no chance at all last week. And it was wrong... If a team has low shot rates and goal conversion, they tend to get penalised very heavily by the model and you get unrealistic percentages.