Thursday, 5 December 2013

What analysis is good at

It should be an easy question to answer. It should be, but it's not.

What is statistical analysis consistently good at?

I'm talking here about it's real use to the Managing Director of a company, or to the Chairman of a professional sports team, or to a politician. To somebody who has choices to make and is looking for help to make the best choice that they can.

A sceptic can easily reel off a list of things that your analysis can't do. Your analysis probably can't account for human frailty, or random chance, or a whole host of things that it was never designed to measure in the first place.

Your analysis can't forecast the effect of something that's never been tried before.

Your analysis says 'trust my numbers', but offers no guarantees of success.

And your numbers can't spontaneously volunteer new ideas; only tune up the effectiveness of old ones.

When you come right down to it, complex statistical analysis is a waste of time and effort, right?

As an analyst, I hear some of these arguments a lot. It's true that statistical analysis can't come up with the perfect strategy on its own, but it's still a hugely important tool. Here's what I think statistical analysis is really good at.

Analytics will conclusively reject a multitude of bad strategies that you might otherwise employ.

Analytics stops you making avoidable bad decisions.

Does that sound a overly negative? It doesn't have to be.

This is the scientific method applied to business and its tremendously powerful. Scientists know that you can't ultimately prove the truth of anything; that there's always the possibility that you're wrong. What you can do is falsify what definitely isn't true. All of our scientific knowledge about the world is based on theories that we're only working with for now, until we prove that they're wrong. All of it. But just look at the progress we've made by rejecting ideas that don't work...

It's this scientific method that means we've found ways to cure many diseases, which were previously terminal. And it's the rejection of this evidence-based method that can kill people who believe strongly in homeopathy.

Do you reject analytics because the answers are obvious and it will just tell you what you already know? You're a corporate homeopath.

Rejecting ideas that don't work is real progress and a truly valuable exercise. It's how we learn; we try something, we reject it, we have a think and then we try something else until we find a method that works.

You can often spot a good analyst by the way that they approach problem solving. If you ask a good analyst why sales are declining, they'll come up with a whole host of different possibilities and then work with data to disprove them - one at a time - until they're left with the most plausible explanation. It's a process and it's the true value of analysis. It stops us from accepting hypotheses that aren't true; from blaming bad weather, or bad luck for under-performance, when really our business has systematic problems.

Sam Allardyce (the West Ham manager) talked this week about using statistical analysis in football and it's fantastic to see this type of discussion starting to gain real traction. Something that he said struck me as slightly jarring though.

"You can take out of it what you want. You can find your best performance in each area. You can find your best performance on fitness level, you can find your best performance in possession…"

It might just be throwaway phrasing from the interview, but that could also be heading firmly in the direction of confirmation bias. If you analyse your best performances, you'll find the occasions when what you tried appeared to work. Your worst performances are often a lot more valuable, because you're forced down a route of working out why they were bad and then coming up with ideas to fix them.

Very often, I find that analytics sceptics are those who are looking to confirm the effectiveness of the strategy that they're already employing. It's self-fulfilling then, that your analysis won't be able to teach you anything new. At best, analysis like this is an internal marketing tool; a way to 'prove' you're right and end any debate about other options and in the short term - until everybody works out that's what you're doing - it might be somewhat effective at that job. EMI was determinedly doing using analytics like that for the short time I was employed there. Before reality struck and it was broken up and sold.

Good analytics...

Proves conclusively that bad ideas aren't working

And so forces you to think up new ideas

Which you can then analyse to see if they're an improvement

Good analytics...

Gets you there faster. Of course you'll work out eventually that a bad idea isn't working, but wouldn't you rather know now, before it's too late?

And finally, good analytics will prompt new ideas, by giving you details about what went wrong with the old ones.

There are so many other benefits of taking an analytical approach to a problem, but this is the big one. This is what statistical analysis is really good at and this is my answer when faced with scepticism. Of course analytics can't solve every problem, but used correctly, it can solve a very, very big one.

Thursday, 26 September 2013

Off the corporate grid part 1: Cheerio Windows

I wrote a post a while ago about how I planned to try a little geek project of dropping myself off the corporate web. No more free GMail, other Google tools, or Microsoft Windows... and see how much I miss them. Is 'free' worth the price?

This post is part one - dumping Windows for Linux on an old laptop.

Tech skills needed:  4/10
Worth the effort:  8/10
Value for money: 10/10

I'm very much writing here from the point of view of somebody who'd like to give Linux a try, but ease of use is a major priority. I know that for many people, Linux is a rewarding investment of their time and some enjoy battling to make a piece of software work properly. Screw that. Day to day, using a computer should be easy. I've probably got some terminology wrong below and been floundering around with issues that an expert wouldn't even have noticed, but that's part of the point - can an average user get by without Windows?

Before starting this little exercise, I knew what Linux was - an Open Source (free) alternative to Windows - but I'd never used it, beyond a quick hack to get some files off a dead PC (of which more later). Of the few people I'd talked to about Linux, a couple are serious tech experts who are definitely up for some hardcore IT fettling, and one was annoyed that he'd bought a netbook with Linux on it, rather than Windows, and it wouldn't talk to a few of his other gadgets.

I suspect that along with a lot of other people, I had a vague feeling using Linux would involve battling with screens like this:

But don't worry, it doesn't.

As a new user, the biggest problem you face after deciding to maybe give it a go, is in working out what to install.

Google "install Linux" and the top link is to "Red hat enterprise Linux". This looks complicated. It's not what you want.

Then after a bit more reading, you find out Linux comes in distributions and that there are quite a few. Distributions are like different skins; they all use the Linux kernel as the central architecture that makes them work, but they behave in different ways, with alternative looks. You need to choose which distribution to install and now you find yourself on a website like this.

Oh, for Pete's sake.

I strongly suspect at this stage a lot of people say, "oh, sod it, I can't be bothered" and go back to Windows. I've certainly done that once before. This time I'd made a promise on my blog though, so pick a card... any card...

Linux Mint is top of the popularity list. Let's install that.

Wikipedia says Linux Mint

"is a Linux distribution for desktop computers, based on Ubuntu or Debian."

and that Ubuntu

"is an operating system based on the Linux kernel and the Linux distribution Debian, with Unity as its default desktop environment."

And I say if Linux wants more mainstream adoption (which I assume it does) then it needs to stop making things so bloody difficult.

Mint is easy to install and easy to use. You don't need to know what Debian or Unity are. I've only got a vague idea what they are. I might find out at some point but you really don't need to.

Mint has a nice clean homepage, with a prominent download button.

Unfortunately, when you click through to the downloads page, there are too many options for an IT novice. You thought you'd solved the problem of which distribution you want? Well Mint comes in several different flavours too. Somebody's doing this on purpose.

This page badly needs a big, fat, "Don't know what to install? You want this!" button. At least the one you want is at the top of the list: Linux Mint Cinnamon.

You've just done the hardest bit. I'm not kidding, the hardest bit of using Linux is working out what on earth to install in the first place.

Download Cinnamon (32 bit or 64 bit depending on your PC) and then you'll need to make a disk to install it with. It comes as an ISO file, which is the contents of a DVD, ripped onto a file. If you don't already know how to turn that into a DVD, then a quick Google will turn up lots of software, or you can follow this easy guide.

Congratulations, you've got an installation disk! That was definitely harder than buying a Windows installation disk, but it was quite a bit cheaper too.

You can actually use your disk straight away by putting it in the drive and then rebooting your PC. It should boot straight to the Mint desktop, which is bloody handy if your copy of Windows ever takes a dive and you need to get all of your files back. I've rescued videos and music from a laptop like this in the past, when Windows flatly refused to either boot, or reinstall.

You won't want to run Mint from CD all the time though, because it's slow and doesn't remember anything when you turn off your computer,

To install properly, you click on the desktop icon and it's easy from there, but you have a choice to make. Do you want to clear Windows off your laptop completely (copy your files somewhere first!), or do you want to 'dual boot' so that you'll be asked whether to start Windows or Mint when you turn on the PC?

We're just experimenting here and clearing Windows altogether feels a bit rash, so dual boot is probably best.

You'll need somewhere to put Mint, so you have to create a partition on your hard drive, using Windows. This guide explains how.

And now you can install it from the DVD you made.

And we're done. So what's it like?

Well it's like Windows. It looks like Windows and it acts pretty much like Windows too. It's even got a Start menu that pops up when you press the Start button. I'd be willing to bet that if I put it on my grandmother's laptop, she'd barely notice.

(don't worry, you can change the background image, just like Windows)

On an old laptop, you'll find it starts up faster and it doesn't do the special Windows boot thing of looking like it's ready, but keeping you waiting for another five minutes before you can actually open a program.

The web browser runs faster and doesn't hang all the time. (I know you can put Firefox on Windows too but it's an old laptop and it never really ran Windows Vista properly.)

I swear the laptop battery lasts longer.

These are all big ticks. Linux is a much lighter load on the PC, so if you've got a laptop that's getting on a bit and that is only really used for web browsing, you'll find it's a much faster, slicker experience.

Mint comes with Libre Office, which is a more than passable alternative to Microsoft Office and can still open all of your .doc and .xls files. If you don't need VBA macros and heavyweight Excel workbooks, it's great.

The best recommendation I can make for Mint is that since making our old laptop dual boot about a month ago, I've only touched Windows once, because I needed it to talk to a GPS and it has the drivers built in. My wife hasn't used Windows at all. Why would you? For simple tasks, Linux just works better. It's only more obscure gadgets that are an issue too - plugging in a digital camera or a USB thumb drive is fine.

I have found myself on a few forums, learning some more complicated bits and pieces when I wanted to push beyond simple web browsing and admittedly I wasn't able to make Google Picasa work properly, even though it's supposed to. All in all though, if you want basic features, Linux is brilliant. If you want more than basic features you can certainly have them, but you'll need to get your hands dirty.

The only real difficulty I found for Mint was that when it first started up, everything worked perfectly except wifi. This is because the wifi card needs a proprietary driver, which Mint had found, but didn't activate automatically. Easily fixed through a simple menu, but I'd have liked a pop up on the first boot, prompting me that Mint already knew how to make the card work and asking if I wanted it switched on. As it is, it's possible a less curious user would have just assumed Mint didn't work with their laptop.

In terms of dumping the corporate web, this one's a 'not quite', but well worth doing all the same. There's no way I could survive without Windows at work (no Excel, Tableau, SQL Server...? Not going to happen) and at home it would probably wind me up about once a month that something Windows is able to do, was difficult or impossible with Linux.

You can have the best of both worlds though. A slick, fast experience that's not beholden to Microsoft for 90% of the time and a quick boot into Windows when you have to. I'm impressed. This has been a really worthwhile little experiment and I'd thoroughly recommend it.

Wednesday, 18 September 2013

Luck in football part 2. Can you have a lucky season?

"It evens itself out over a season and that will never change. You get breaks here and there. Every club gets good breaks, bad breaks."

Sir Alex Ferguson

Does it though? Does luck even itself out across a season? This is a follow up to my post a couple of weeks ago, looking at how much luck there is in a single English Premier League result. The obvious next step is to try to extend that analysis to a thirty eight game season and see whether, over a larger number of games, most of the random chance in football then disappears.

Before we dive into the analysis, it's useful to think about what level of luck might feel right for an average team, in terms of the number of points that team finishes with, compared to how many points they 'should' get. Plus or minus a point across a season? That obviously could happen - you sneak one extra draw, or rattle the bar in the 90th minute at 1-0 down just once in the season and there's your extra point either way. One lucky won or lost game is also pretty easy to envisage, or maybe even two lucky wins or losses. Three lucky wins and nine points? For me that's within the bounds of possibility, but starting to feel more unlikely.

From a statistical point of view, thirty eight games per team isn't all that many, so some level of randomness is definitely going to creep in. The challenge is to work out how much randomness and whether it matters in the grand scheme of things. Whether the league table is random enough that the best team won't always win the title, or if an unlucky mid-table standard team can get relegated.

Working out random chance across a whole season is more difficult than working it out for a single game against the hypothetical 'average opponent' that we saw in my last blog post. In response to that previous post, a few people asked how you'd set up a team's chances against a specific opponent, rather than a generalised 'average' one, which illustrates one of the key problems. If you could predict goal scoring and concession rates against a specific opponent, then you could predict the final score. Then you'd be able to beat the bookies and make a lot of money. Which is hard. In essence this is what my prediction model tries to do and is too complex to form a base for this analysis.

For this post, I'm going to assume that bookmakers' odds are a fair representation of each team's chances in a game and use that as the basis to simulate a season. You can argue with that approach, but I have a suspicion it's going to cause fewer arguments than any other results prediction method that I might use.

At least everybody can see where these numbers have come from and as an added defence for this method, if the bookies odds were consistently wrong across a lot of teams, across the whole season, they'd be losing a fortune.

"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."

Donald Rumsfeld

OK, that's probably not very helpful. The reason for the quote is because the bookies odds aren't perfect and it is possible to predict better than they do, which might reduce the amount of 'luck' (can we call it random variation instead?) that we're about to measure. It's possible to consistently predict better than the bookies and there are also all of the factors about a game that neither we, nor the bookies know. If a manager plays his best striker - who's nursing an injury - and loses due to a poor performance, that's not bad luck, it's a bad call. But if we didn't know about the injury then it won't have been priced into the market odds.

What really matters for this post isn't that we have brilliant odds for each individual game, but that we have a fair representation of what a season looks like as a whole, so that we can run simulations. It's not about the individual teams, it's about having a realistic spread of probabilities across a season's 380 games and for this, betting odds should do a good job.

I'll come back to the definition of luck and the implications of using betting odds at the end of this post, but let's get stuck into some numbers. Here are Arsenal's odds for each game last season, taken from Bet365 and re-based to remove the bookmaker's margin so they sum to 100%. (data from

If you run that fixture list ten thousand times with those odds, on average Arsenal will finish with seventy points. They scored seventy three points in 2012/13, so on that (huge!) sample of one season at least, the odds-based method seems sensible.

We get a distribution of points for Arsenal using our ten thousand simulated seasons, which looks like this.

Sometimes Arsenal will score fewer than seventy points and sometimes they'll score more, purely through random variation, or 'luck'. The standard deviation of Arsenal's final total is 7.5 points, which means that although the average is seventy, in any given season Arsenal are likely to do seven points better or worse than that. In 66% of seasons, their final total would fall between sixty five and seventy nine points.

One simulation in ten thousand, last season's Arsenal squad is ridiculously unlucky and gets relegated with less than the magic forty point total. This might sound far fetched until you consider that it's a one in ten thousand chance and there have only been 126 seasons of professional football in England, in total, ever. The chances of relegation happening to last season's Arsenal squad are vanishingly small.

Running the same ten thousand season simulation exercise for each team in last year's Premier League, gives you the following points distributions.

We've got a clear top two, a fairly well ordered top seven including Everton and then feasibly any team from eighth downwards could be relegated. Ouch.

We can translate those points totals into finishing positions for each team in each of the ten thousand simulated seasons, to get a likelihood of achieving different positions. Some of the randomness starts to disappear here, because for a team like Newcastle or Fulham to be relegated doesn't just need them to be unlucky. Other teams below them need to be lucky too.

And because I know you're going to want the raw percentages for those...

Were Newcastle unlucky to finish 16th last season? I'll leave that as a rhetorical question.

Picking on Liverpool, a team with their 2012/13 chances in each game (and please note how careful I'm being with my words here; not Liverpool, but a hypothetical team whose chances were exactly represented by the odds Bet365 gave Liverpool last season) would win the league by 'luck' one year in twenty (5%).

The analysis has turned up one more result and it's a result I found quite surprising. Before running any numbers, I'd hypothesised in an email exchange with @SimonGleave that good teams and bad teams would have less randomness in their points total, with average quality teams seeing the most random variation. The reasoning for this crudely being that good teams win a lot and bad teams get beaten a lot and both of those things reduce the space for luck to play a role.

I was wrong.

Here are the standard deviations of season points totals for each team in 2012/13.

Everybody scores plus or minus seven points. Weird.

What that does mean in effect though, is that teams higher up the table have less random chance as a proportion of their total points, than teams lower down, since seven is obviously a much larger proportion of forty points, than of ninety points.

We have a standard deviation 'luck' (I still don't like that word) measure varying from 9% of Man City and Man U's most likely points totals, through to 20% of Reading's.

My hypothesis about why seven appears to be the magic number for all teams is that every team has a number of peers - teams they're similar to and will share points with - and a number of teams that they're either much better or much worse than. This gives every team a similar number of games with fairly uncertain results, where points will be shared, compared with fairly certain ones, whether that's fairly certain to win or to lose.

I've linked to this piece of work before, but it's reassuring how close this result is to the key finding of plus or minus eight points in a post by James Grayson, which kicked off this whole thought process for me. My initial reaction to that number was that it felt too big, but now I'm coming to a very similar conclusion.

I do think we should be treating these numbers as a maximum level of random variation though, because in reality teams will react to their league position and try to change their chances. Teams with more financial resources will be able to react to an unlucky first half of the season by signing players and improving their odds in the second half.

Better predictions than bookmakers manage could also reduce the amount of measured luck, because we'd be more certain about which team should win each game, reducing the level of random variation in results. As I said earlier, the bookies aren't perfect (just annoyingly good) so this should definitely play a role in reducing the true standard deviation below seven.

Finally we're also back to the core question of what is luck? I'm not at all sure when most people say 'luck' that they intuitively mean 'random variation'. It's more nuanced than that. Steve Fenn (@SoccerStatHunt) tweeted a nice definition yesterday:

The key word here being 'unearned'. This got me thinking that you're lucky if:

So what's 'consistently'? I think this goes to the heart of what we'd intuitively define as lucky. In the simulations above, Manchester United had a 36% chance of winning the title, just about equal with Manchester City. If either of those teams win, are they lucky? They've got the best chances out of any team, but 36% isn't huge - it leaves a 64% chance that some other team wins it, which is much more likely.

By those percentages alone, you'd always need luck to win the title.

If you win a single game that you had a 49% chance of winning, were you lucky? After all, there was slightly more chance that you wouldn't win it.

What we mean by luck, seems to be lurking somewhere in an area of 'beating a team that played significantly better than we did'. For me, 'lucky' has an intrinsic element of fairness in its definition. Lucky wins are unfair. Things shouldn't have happened that way. Somebody else deserved the win.

That's why luck is so hard to define - because it's subjective. Your definition and mine could well be different. As a neutral, I'd say a non-league team with a 5% chance of beating a Premier League opponent in the FA Cup third round deserves everything they manage to achieve. A fan of the losing Premier League team, who's going to take a pasting in work on Monday, would probably call them lucky bastards.

What we have shown here, is that bookmakers' odds suggest we're watching a league where in any one single season, each team's points will swing plus or minus seven from their most likely total. That could well be enough to relegate an undeserving team. Of course, that's undeserving depending on whether you support that particular team and depending on where you, personally, draw the percentage line on 'unlucky'.

Wednesday, 4 September 2013

In my experience, there's no such thing as luck... Except in football.

Football's back! Hurrah!

There'll be an update on the predictive model in the near future, but suffice to say for the moment that it's not dead, which is a huge relief. The predictions will be moving off Wallpapering Fog so that I can keep accessing Opta's data, but we'll be giving the bookies another good pasting from a new home this season, starting in a month or so's time. Stay tuned.

In the meantime, I've been thinking about luck.

There's obviously a lot of luck in football. Games are low scoring and that means one extra goal is very valuable, which is why we celebrate scoring so much vs. a sport like basketball, where it almost makes more sense to celebrate thwarted opposition attacks than your own successful ones.

The margins in football are small. Overall, in the past five seasons, 9.2% of shots were goals and 1-1 is the most likely scoreline in a single game. In that context, the possibility of an additional 'lucky' shot leading to a 2-1 win, rather than a 1-1 draw, is very real.

I'm far from the first person to work this out.

James Grayson has posted an excellent analysis, which concludes that the average team in the English Premier League has a random points variation of +/- 8 points. That is, whatever your 'correct' league placing, in any given season you'll do 8 points better or worse than that, just through pure chance.

Zach Slaton follows up this work on Forbes, extending it into Arsenal's chances of a top four finish and speculating about whether they have been lucky in recent years with Champions' League qualification.

And The Numbers Game concludes that football results are 50% luck. (Say the reviews. Hands up, I haven't read it yet, so I can't treat this claim fairly. I've ordered a copy.)

I don't like luck. Or rather, I feel strongly that if a large proportion of football results are down to luck, then we need to be very sure we measure that proportion correctly. If nothing else, how do you tell a manager whose job hinges on avoiding relegation, that he might as well flip a coin, because it's out of his control? Or tell a mid table side that they'll finish anywhere between eighth and sixteenth and there's nothing they can do about it? I know that's not exactly how probability works, but it may well be how the analysis is perceived. Which is important.

I want to have a crack at this question myself using a bottom-up approach rather than looking at points variances across a league. It might come out with exactly the same answer, but I like building analyses from the ground up because you can see the assumptions more easily and it's often easier to communicate what you did, rather than asking for an audience to trust in a complex formula.

So how do we build a ground up analysis of luck?

I'm going to begin with shooting, which is an assumption running through this whole analysis. You'll see why in a second.

Across the past five seasons, the average EPL team had 14.3 shots per game and scored with 9.2% of those. We can take the 14.3 shots and do some basic (ahem, I didn't have to revise these methods at all, honest) probabilities...

In simple terms, a shot outcome is 1 or 0. Either the shot is a goal (1) or it isn't (0).

We've just seen that the chance of a single shot going in is 9.2%, so the chance of it not going in, is 90.8%.

The chance of a team having 14 shots and not scoring with any of them, is 0.908 * 0.908 * 0.908...

= 0.908^14

= 0.26

So for an average team, taking an average number of average quality shots, there's a 26% chance they don't manage to score at all in any one game.

If you want to see what the chances are that they score once, or twice, or more, then you need the binomial distribution. It gets a bit complicated and you need to use a distribution, because any combination of the 14 shots might go in, or not, and that's a lot of different combinations.

Here are the chances for a team that takes 14 shots in a game, of scoring different numbers of goals.

And here comes a big assumption. In reality, there are 'score effects' in football, which this analysis isn't going to consider. Score effects mean that certain scores are 'sticky' because they encourage teams to sit back and defend, while other scorelines let a team relax and attack. Think about what happens when an important game is at 1-0 going into the last 10 minutes, compared to if one team is already 4-0 up.

If we ignore score effects, what might the result look like, for two exactly equal and average teams playing against each other? They'll draw, right? Well no, not usually...

They'll have 14.3 shots each and score 9.2% of those.

Here are the possibilities for a draw:

So two hypothetical teams which are exactly identical in every way and take an average number of shots each, will only draw 27% of the time.

The other 73% of the time, they share the wins: 36.5% each.

This is, I think, the extreme example of luck in football. The game should be a draw, but 73% of the time it won't be. You could say that in 73% of these games, the result is being determined by chance. Overall of course, it evens out over a large number of games, but it doesn't in a single game and these teams will only play each other twice per season.

I was drawn into this topic because I didn't intuitively like the levels of luck that were being suggested and now I'm saying some results are 73% luck. Damn.

If you double the shot conversion rate, you get even fewer draws, coming out at 20% of games. Teams draw most often when either they don't shoot, or when the shot conversion rate is very low, which makes sense. At only five shots per team, the chances of a draw increase to 48%, with a lot more 0-0 and 1-1 results.

That's enough about draws. What if we look at wins?

We should pause here for a moment and define what we mean by 'luck'. I'm taking it to mean that the best team doesn't win, even if that best team is only very slightly better than their opponent. If the lower skilled team wins or draws, purely through how the dice rolls on shot conversion, then they were lucky.

This isn't totally satisfactory and goes to the heart of why I don't really like talking about luck. It's easier to call the random variation in a team's total season points 'luck', than to label a single game 'lucky' but the concept is the same - it's about winning points you don't deserve, or losing points that you do. If a non-league underdog beats a Premier League team 1-0, having taken only one shot to the superior team's twenty shots, were they lucky to win? I think most people would agree that yes, they got lucky. The newspaper sports pages wouldn't, they'd call it a giant killing, but then, if Goliath should squash David nine times out of ten, David got lucky.

Luck kills much of the narrative that we love about football. I watched Exeter City get a 0-0 draw away at Old Trafford in 2005 and it's the best game I've ever been to. Were we lucky? Of course we were. On a different day, Scholes scores and Man U win. On most days, Exeter would get battered.

But to just label that result 'lucky' and dismiss it, is to dismiss a lot of what makes football great. If a manager sets up a team to have a 1% chance of winning and wins, he's lucky. But 20% chance? 30%? I'd prefer to say you're giving yourself more chance, than that you're lucky.

However, for this analysis we need a dividing line. For the rest of this post, if you beat or draw with a superior team, purely through the way the dice rolls on shot conversion, then you're lucky.

The draw example above was a fun starting point but it's not really sensible because you'll never really have two exactly matched teams. It can give us a good idea of maximum luck though because if we assume that Team A is very, very slightly better than Team B, then we get:

Team A win: 36.5% (plus a tiny marginal amount)
Team B win: 36.5% (minus a tiny marginal amount)
Draw: 27%

Team A should win. They're the marginally better team and Team B can only beat them by being lucky. The chances Team A don't win are 63.5% (the chance that Team B wins, plus the chance of a draw).

So in a very evenly matched game between two statistically average teams, we've got 63.5% of the result being down to luck.

Let's ditch the hypothetical average team and bring in some real ones. Here are average shots per game and goal conversion rates for each EPL team, averaged across the past five years.

Properly calculating the chances that a team will win a game instead of draw it, is slightly difficult, because there are many combinations of scores that will win you a game. We work out the chance that a team will score 1 goal, with the opposition only getting 0, then of the team getting 2 goals, with the opposition only getting 1 or 0, then of getting 3 goals, with the opposition scoring fewer than that... and so on. Then we add up all those different combinations that would give you a win and you have the overall chances of winning.

To see what that looks like, here are Manchester United's goal chances - based on the table above - vs. the average team that we saw earlier.

Based on these chances, Manchester United would be expected to beat a statistically average team 55% of the time, with 22% of games drawn and 23% lost.

(I know the opposition are unlikely to get their regulation 14 shots against Man U. We'll get to that in a minute...)

If we say that Man U are the better team and that the fairest result is a Man U win, then the opposition are 'lucky' 45% of the time, when the game ends in a draw or a loss for United.

Now we've got a spread of luck, from 63.5% when two teams are almost equally matched, to 45% when Man U are playing. That feels better. Man U have historically dominated games and left less space for random chance.

We can do this exercise for the whole list of teams above. How much luck is there in the result, for each EPL team, when playing against an average opponent who gets off 14 shots, with a conversion rate of 9.2%?

Luck works both ways. Manchester United have a low luck coefficient because they're likely to win; Middlesbrough also do, but because they're likely to lose.

Following me so far? There's still far too much luck here though, because we haven't introduced defence yet.

As I mentioned earlier, Manchester United don't let the average team take 14 shots, or let them convert those shots at 9.2%. They defend much better than that.

Here's each team's attacking performance - that we saw earlier - and now also average defensive performance. The extra numbers show us how many shots the average opponent manages to hit against each team and how many of those go in. Again this is all summarising over the past five years of the English Premier League.

The final piece of the puzzle is to set up each team against their own average opponent, instead of always using the general average opponent across the whole league.

Manchester United's own and opposition scoring chances now look like this, which feels a lot more like a real picture of Sir Alex on a Saturday afternoon, against a mid-table side:

Here's what that does to the luck coefficients for each team:

We end with a spread of results that are happening by chance, from Manchester United at the bottom end, where their average opponent can achieve a result through luck only 32% of the time, to Fulham where a very high proportion of the result - 63% - is being governed by chance.

It's worth stating that this doesn't mean Fulham are particularly 'lucky', or even that they're a bad team. It just means that they're incredibly average, so when pitted against an average opponent, the result could be anything.

That's it for single game probabilities and if you've got any feedback on the methodology, or you want to tell me I've got my sums wrong, please jump on the comments below. The next step is to take these and apply them to a full season, to see if Arsenal really are fortunate to keep making that top four and also maybe try to work out whether relegation really is a lottery.

Luck in football part 2. Can you have a lucky season?

Stats from EPL Index, which unfortunately doesn't do public data any more, but is still a cracking site and deserves acknowledgement!

Friday, 30 August 2013

An experiment: How much do we need Google, Facebook, Microsoft...?

It's time for a project. The house is decorated, the motorbike is running sweetly, my football data sources have dried up (for the moment, anyway) and the British weather isn't playing ball with paragliding conditions.

Spare time alert. I need a new project.

A few different things have consolidated together into Wallpapering Fog's next occasional series of articles.

Plus a family member asked how to avoid all this online tracking and I said that beyond a few anonymising and blocking tools like Adblock, Disconnect and Ghostery, I didn't really know.

What kind of a data monkey gets asked a question like that and doesn't really know? Poor form, that. Very poor form.

Today, @Wired flagged up a new website which lists some open source tools and services that can be trusted. You can take a look at

So how easy is it to run your own tools, instead of taking the commercial software and social network routes, in return for being tracked and advertised at?

Don't get me wrong, I'm a big fan of Gmail's interface and I like Google+ for its photo galleries. I don't really like Facebook but the paragliding community has started to organise itself on it, so my account needs to stay. This isn't going to be a toys-out-the-pram abandonment of the corporate web, but I want to know how easy it is to get along without the big boys' toys. Can a fairly techy analyst do it, or is it hardcore geekery, reserved for the people who wear Linux t-shirts and code their own printer drivers? How far can you stretch a £35 Raspberry Pi? Do we really need Facebook as a platform for sharing, or are we just lazy?

A few things I know I want to do are...

  • Set up a Raspberry Pi to run my own cloud storage (replacing Google Drive)
  • Stick Linux (which one? So much to learn...) on our old laptop at home, replacing Windows
  • See if I can run my own email (replacing Gmail)
  • Build a photo gallery website (replacing Google+ photos)
And there are bound to be more projects that start to reveal themselves along the way.

I've got an idea that centralising web services is, in many cases, stupid. One Facebook data centre has got so big, it generates its own indoor rain clouds... Why not take back your data, onto your own small-scale system that you control? The only reasons I can see why not, would be if it's hard, unstable, or expensive. Let's find out if it's hard, unstable, or expensive. If nothing else, it will be a fun journey!

Raspberry Pi ordered and when it arrives we'll set about finding out and documenting what can be achieved by an enthusiastic amateur with a few hours to kill.

Monday, 12 August 2013

Top 10 Excel Sins

If you work in a marketing agency, you see some horrific Excel abuses. Here are my top ten.

Do you do any of these? For the love of God, stop it. Just stop it, right now.

Typing numbers straight into a spreadsheet, with no hint of where they came from
Number 1 deadly sin. Anyone doing this deserves to lose a finger. Maybe their left hand.

Wonderful things happen in Excel when you type "=". You can add stuff together! You can multiply! The next person who comes along after you, can understand what you did! It's marvellous and you should definitely try it.

An ex-colleague used to use Excel like a piece of graph paper and work out all his sums separately with a calculator, then type the answers onto an Excel worksheet. This is second only to using Tippex on your computer monitor. If you type numbers straight into cells, instead of leaving a trail by working them out with a formula, you're just as bad.

Hiding cells
To be fair, this is sort of Microsoft's fault. The hide cells functions shouldn't exist, or if they must exist, it should be incredibly in-your-face obvious that something has been hidden.

Barclays offered to buy 179 contracts that they didn't actually want, from the bankrupt Lehman Brothers, due to hidden rows. You have been warned.

If you absolutely have to hide things, use Group. It's not so well known, but it's much more obvious what you've done.

Shrinking column widths, until the column disappears
A favourite of people who don't know how to hide cells. This is so monumentally stupid, you shouldn't be allowed to use Excel ever again.

Colouring in cells to represent data
Want to piss an analyst off? Do this. You've got a big list of something - maybe a list of customers - and you want to highlight some of them as being your best customers, so what do you do?

You could type "best customer", or even better "TRUE" in a new column next to the customer names. That would be good, because then you can filter them, or use that column in formulas, or pivot tables.

Or if you're evil, you could colour all the best customers in yellow, so that anybody who wants to work with only the best customers, has to do it by hand.

Guess which one most marketing people pick?

Using Excel's default charts
Grey and two shades of purple either screams "newbie", or "incapable". Which one would you prefer?

Using Excel's 'exotic' charts
Step away from the 3D pie charts. Here's why.

Using loads of named ranges
In moderation, named ranges are mostly ok. Excel has a bad habit of corrupting them without you realising, but they're not so terrible.

Opening a workbook that has hundreds of the things in it is horrible though. Unpicking how a number is calculated, when at every step you have to look up a name, then find out what that name refers to, can really ruin your day. Names are great while you're building a spreadsheet. Six months later, when you can't remember what you did, they're a proper pain in the neck.

Inconsistent logic
This is how big mistakes happen. Really big, expensive mistakes.

Sometimes you've got a big grid of numbers - 1000 or more rows of calculations and a few rows need to be "fixed". The tracking was out of line that week and needs to be adjusted downwards 10%, or certain rows don't have VAT added, while others do.

You could manually edit those numbers that need changing, or alter the formula in those cells, to add on VAT.

Now what you've got is a big column of numbers that look like they're all calculated the same way. Except that starting from row 800, the formula changes.

You will forget that you did this. It is inevitable.

At some point, somebody - probably you - will want to add some more rows to the data and when you do, you'll copy the formula downwards, assuming that everything below it is the same. At this point your carefully edited "VAT" rows will disappear, your final answer will change and you'll have no idea why, or what happened, or how to get back to where you were.

You're screwed. And you deserve to be.

Macros for everything
Excel Macros are tremendously useful. They're the the tool that brought IT capabilities to massed ranks of analysts and even if it's getting on a bit, I still think Visual Basic for Applications (VBA) is brilliant.

But. And it's a big But. Most people who get good with VBA go through a few stages. First, you can't make it do very much. Then you get better and you can build macros to do almost anything, so that's what you start to do.

Stage three is where you realise that Excel actually had functions and shortcuts all along, to achieve the same as many of your macros, only faster and better. Don't let yourself get stuck on stage 2! You'll waste tons of time programming and the workbooks you build will only ever function properly on your own PC, where your macro library lives.

Whatever the problem, Excel is the solution
Excel's great, everybody's got a copy and it's so flexible, you can do almost anything with it. But that doesn't mean you should...

Excel isn't a word processor, an illustration package, a dashboard designer, a database, a calendar and it also isn't many other things, even though you can usually make some passable looking output with it.

When Excel starts to get frustrating, there's probably a better piece of kit out there for the job and that better piece of kit is very often free. Stop abusing Excel and go and look for it!

Tuesday, 2 July 2013

Big generic search terms are inevitably unprofitable

Before we start, I'll put my hands up; I can't prove the statement in the headline of this post. We just don't have enough data. But with that out of the way, I'd like to try to persuade you that it's true.

Whenever I look at a new or prospective client's generic search data, it invariably contains some big generic search terms for their market and also, inevitably, the cost per sale on those terms is way over target.

I'm talking about search terms like "savings account", or "holidays", or "mortgages". The really obvious generic search terms for a market, with no qualifying words attached.

This result has happened often enough now, that a pattern is starting to emerge. I think it's almost always true that the big generic search terms are overpriced. They are - by definition - unprofitable. I'd like to show you why I think that.

Take the search for "European holidays" in the image above. It's a broad enough term that the chances of an individual clicking through and converting from it will be low and it's interesting to enough companies that there will be a lot of potential bidders in the auction.

What's important here is that search ads are priced by the bids in an auction.

A favourite tool of economists when they don't have enough data, is to assume away complications and then see what the world would be like if their assumptions about a market were true. The limitations of doing this are obvious (in that if your starting assumptions are wrong...) but it's still a useful exercise. In essence, a lot of the time what economists are doing is thought experiments, with maths.

So what would the "European holidays" search auction look like if...?

- Each company had perfect information about their search ROI (including any 'attribution' or 'paths to sale')

- Each company was the same size, with the same profit margin and sold the same quality of product

- Each company had the same search quality score (so they rank the same on Google for the same bid)

- Each company had the same conversion rate from their search ads

We've got a load of identical companies, who truly understand return on investment (ROI), all bidding for the same search term. Here's what would happen.

One company bids for the term "European holidays" and starts to sell its product. Then a competitor bids a little more, because they can still generate a positive ROI, at a higher bid. Another competitor bids and the bids keep rising. The profit per sale keeps falling.

Eventually one company bids high enough that they make exactly zero profit from running ads on that search term.

And we stop. Everyone's got perfect information about ROI and nobody's going to bid high enough for the term "European holidays" that they start to lose money.

In a market with perfect information and identical competing companies, nobody makes any profit from generic search, because if they do, somebody else will always bid the auction for that term up a little higher.

Actually, it's worse than that. By owning generic search for now, a company might be able to drive its competitors bankrupt and then reduce bids in the future, once they've gone.

Even in a market with perfect information, there's an incentive to bid up a search term's price until it is unprofitable.

With me so far? Now let's break one of the core assumptions I listed above. We'll leave everybody in a market with competitors who are exactly the same as them, but take away their ability to measure return on investment.

Now what happens?

Some companies will underestimate their generic search ROI, some will get it right and some will overestimate it.

The ones who overestimate ROI, will bid too much.

Maybe those companies think that generic search ads cause a lot of brand search - much more than is actually the case. Or they can't track customers all the way from clicks to sales and they wrongly guesstimate the link between the two. Whatever the reason, they overcook their bids because they can't measure ROI accurately.

This means that even if you're a clever company, with perfect measurements of ROI, you can't bid profitably on the search term. You can't bid because there's another company, with a less capable marketing director, who's sent the auction price too high.

This is disappointing.

Google doesn't just auction one paid search position, it sells a few, so you may be OK if you bid to rank lower in the list - it depends how many companies are overcooking their ROI estimates. One fool bidding means that the top spot has gone. Two or three and now you're just fighting over the smaller side-bar ads. It doesn't take much for bids on the whole search term to be overblown.

As an economist working with search data, this is what I see; a market where imperfect information means auction bids on high volume generic terms are forced up to an unprofitable level in almost every sector.

The only exceptions to this rule would come when we break some of the other assumptions that I listed above. You need a higher profit per sale than your competitors, or to have a better conversion rate, or a better quality score, to be able to profitably bid higher. This is why smaller generic terms often work well, because you bid on the ones that convert best for you - the ones that play to your company's strengths, where you have a better conversion rate than your competitors.

But when you don't have a competitive advantage - when the search term is very general - we come back to a serious issue.

Big generic search terms are inevitably unprofitable.

Thursday, 20 June 2013

How do you ask "Europe - yes or no?"

The proposed question for a referendum on Britain's membership of the EU has been published today, over on Guido Fawkes's blog.

What interests me is that the wording has changed subtly in the final draft, from:

“Do you think that the United Kingdom should remain a member of the European Union?”


“Do you think that the United Kingdom should be a member of the European Union?”

Now as a eurosceptic Tory, why might you want to do that?

It's for the same reason that Alex Salmond was very keen on a certain wording for the Scottish Independence Referendum.

The first question's wording above, has an element of "status quo bias", which has nothing to do with three chord rock songs and everything to do with the fact that survey respondents will tend to agree that things should stay the same as they are now, if you encourage them to do that.

The UK should remain a member of the EU.

vs. the UK should be a member of the EU.

Subtle difference, but if you're a politician chasing votes for your own point of view, you'll grab every little advantage you can get. It's very hard to write a completely neutral question, so you might as well have the one that favours your own position!

Still, at least we can hope that this referendum won't be carried out by Premier Inn.

Wednesday, 12 June 2013

We get the social networks we deserve

Is social getting silly yet? I tend to think it got silly a while ago.

Instagram went for $1 Billion.

Waze (has anybody heard of Waze?) will shortly go for a similar amount, to Google.

Facebook has acquired 35 companies since 2007, mostly for undisclosed fees.

And Twitter could be worth $11 Billion if it floats.

Doctor Evil shouldn't have phoned the UN to demand daft sums of money, he should have threatened Facebook's bank manager.

So what's the problem? If big companies want to spend Billions acquiring smaller companies, which have lots of users but no revenue, then that's their business.

It's our business too. Here's the problem.

Yes, that's a cliché. It's a cliché because it's true.

Somebody has to pay for Facebook and Instagram and Twitter and all the other social toys. Facebook's data centre is so big, it's having problems with the air conditioning forming rain clouds and drenching the servers. Yes, really. You can't fight the weather, so instead, they've waterproofed the servers.

We can be fairly sure that data centres big enough to fit rain clouds in are very, very expensive.

Advertisers are paying for all of this, which indirectly means that you and I are paying for all of this, with our attention and our changes in purchasing behaviour.

That creates two problems and I've talked about one of them before. The pot of advertising money isn't getting bigger, which means there's a limit on how many businesses can be supported solely by advertising and how good their products can get from a user's point of view, before the money runs out.

Today's it's a slightly different issue. Today's issue is that social networks aren't built to be the best social networks that they can be for you. They're built to be the best that they can be for advertisers.

That is the essence of the title of this post: We get the social networks we deserve. We want free stuff, so we get the minimum quality product that will hold our attention, so that we can be advertised at. Being the best for advertisers means lots of eyeballs at minimum cost. A news feed is 'better' if it can fit more ads in it, without driving too many users away, not because it communicates news more effectively. Mobile is 'worse' unless you can figure out a way to fit some adverts alongside the content on a smaller screen.

You ask for free? Then you deserve second best.

Imagine for a minute that you couldn't put any advertising at all on a social network. Either it's impossible, or it's illegal, or nobody believes it works, so no businesses are willing to buy the ads... Whatever reason you like.

Time for a thought experiment: What might social networks look like if their users were the customers?

The first difference is obvious - you'd have to pay for them directly somehow. So now we're looking at Facebook for $5 a month?

Actually no, I don't think so, and this is where it gets interesting.

Centralising everybody's social data onto a data centre so big that it has its own micro-climate is stupid. The only reason for doing it, is that Facebook needs everybody's information in one place so that they can data mine it for targeted advertising (well that, and to make things easier for the NSA).

If we're not advertising, we don't need to centralise the data, so what happens now? What I think happens now is that you can have a distributed social network, built along the lines of the internet itself. In essence, what Facebook does with your profile, is the social equivalent of there only being one website hosting service in the world and all of those websites having to use the same template.

Without advertising, you wouldn't have everybody paying Facebook to host their data. You'd host your own profile on any server that was always-on, in the same way as you can host a website for a few pounds a month.

You could customise that profile however you like, since in the end it's just your personal website, but with a central news feed that announces new content. There's already a good way for websites to do this, through RSS. It's easy to imagine a lively open source community of personal profile templates that would make design easy and I'm sure we'd all rather not go back to the technicolour car-crash that was the original Myspace.

If you like, imagine your social profile as an app that you can install on any sever, for a few pounds a month. It can host photos, videos, blogs, plugins that let you tweet... a whole host of exciting things. Ten years ago this would have been difficult and techy, but there's no reason it should be now. The internet was actually headed this way, with lots of people having their own personal websites (slightly geeky people, because it used to be slightly tricky) before Myspace and then its more modern brethren came along. You used to get some free web hosting space with your internet connection and without Facebook or Twitter, that model would quickly re-emerge, so that hosting your profile(s) wouldn't even cost you money directly.

The final piece of the puzzle for an effective social network without a Mark Zuckerberg to own it, is another - probably open source - piece of software; an aggregator for all of your friends' news feeds, that becomes your "Facebook". That's very easy, as each friend's website is announcing new content and you just have to pick it up as it gets announced. If you use any kind of news reader then you already know how simple this is. The clue's in the name; Really Simple Syndication.

You'd need a way to set some, or all, items private but that's not hugely difficult either and can be handled by  the (open source) software that's driving your profile. There are sites all over the web with private and public areas and yours would have those too.

Without advertising, I'm convinced this is what the social web would look like; individual sites that each have some of the character of their owner, announcing new content to the world, so that friends can find it. The advantages for us, as users, would be substantial...

You own your data. Nobody (I'm especially looking at you Facebook) is incentivised to undermine the privacy of your data in order to sell adverts. And even if they were, they don't have it.

Social tools are built to appeal to you, enough to actually pay for them. They will be better than the ones you have now. Nobody gets to pay to 'promote' content in your news feed that you didn't ask to see.

Your social profile can be whatever you want it to be, rather than a homogenised blue and white. You're a photographer? Make it visual. Writer?  Add a blog. Musician? Host music. Love Hello Kitty? *Unfollow*. Some profiles would undoubtedly be garish, but modern templates would help. We've come a long way since Geocities.

You can divide up your social world however you want and have multiple personalities - work, leisure, hobbies, whatever you want, as separate sites, or just separate news feeds within one site. Nobody's trying to get the whole you in one place so that they can sell you stuff. Google+ tried to do this with circles but they screwed it up. With your own site (or sites) you can have multiple news feeds and please do create a separate baby news one, so that the rest of us can choose not to follow it.

I'm sure there are more... The really big one is that all the tools are being designed exclusively for you. Currently, social tools are designed 'for you' in the same way that fishing bait is designed 'for fish'.

And we're back to the original point.

A modicum of effort and a couple of pounds a month, would see a social networking revolution. But we like free and so we get the social networks we deserve.

Wednesday, 29 May 2013

Fixing the advertising industry (warning: this post is unlikely to fix the advertising industry)

I got involved in a bit of a heated debate yesterday over on The Ad Contrarian's comments section as a result of a post called "Time For Sorrell To Go". The post is a lament that working in advertising isn't fun any more because huge media groups, focussed on the bottom line, have driven the craft out of the business and commoditised everything.

"They have made it leaner and meaner. They have made it more efficient. They have made it more productive. They have squeezed all the fat out of it. They have also squeezed all the life out of it.

They have replaced ideas with data. They have replaced value with efficiency."

I wasn't around for the good old days, but anybody can see how huge companies work and not just in advertising. They're relentlessly focussed on the bottom line.

This is obviously less fun than not being relentlessly focussed on the bottom line.

Oh and before we go any further, I work for Martin Sorrell. Indirectly anyway. Half the people who work in advertising work for Martin Sorrell. Best be careful with this post, eh?

Most of the complaints about WPP, Publicis, Sorrell et al. could equally be applied to the music industry, or book publishing, or newspapers, or Hollywood movies, or...  Essentially, this is railing against capitalism, as a system that encourages a singular focus on short term financial profit.

Many of our clients demand a minimum acceptable standard for their advertising and they want to achieve this standard at the lowest possible cost. That's unlikely to be exactly how they'd describe it, but that's what they do.

"You know we're sitting on four million pounds of fuel, one nuclear weapon and a thing that has 270,000 moving parts built by the lowest bidder. Makes you feel good, doesn't it?"
Rockhound (Armageddon, 1998)

This is advertising built by the lowest bidder. It comes from a belief that in reality the minimum standard isn't so far short of the best, that excellence is worth taking risks to achieve. Let's be honest, in many cases that's true.

Any complaint about a WPP inspired race to the bottom that doesn't start by recognising what our clients want is just howling at the moon. Some of the comments on that original blog post (which is partly what dragged me in) bemoan the fact that it's "all about the money".

Well, yes. That's capitalism unfortunately. Advertising agencies are selling a way for companies to make more money. It's what we do.

In my opinion, this is why creative agencies have (largely) lost the 'lead agency' battle; they keep trying to argue that there are good things about adverts, beyond how much product they sell relative to how much they cost, and even if the client Marketing Director falls for it, the Financial Director won't.

Let's get one thing straight right now. A famous ad man once put this better.

The only legitimate purpose of private sector advertising is to sell more of a product.

It has no other benefits to the companies which pay for it.

If you can't handle that, you're in the wrong business.

Everything else is strategy. Sales now or later? Sales through acquisition or retention? Sales by appearing to be cool, or safe? Sales through happier employees? Sales by annoying people or by making them like you?

I don't particularly like that either, but that's the way it is. I think we should all work flexi-time four day weeks, spend the extra free time in the countryside and the world would be a happier place for it, but it's not going to happen. You can't expect a client to pay extra to run their adverts, just to make you a happier employee.

We're advertising in the big media groups' world. It's probably less fun than the times when clients paid more, there was more slack in the system, everybody had more time to be creative and if you didn't come back from the pub after a heavy Friday lunch, well, that was ok.

We can rant about it, or we can do something constructive.

"Screw the system." 

"No, not screw the system. 
Massage the system, 
play the system, work the system... 
but don't screw the system because 
the system's gonna screw you more."
The Chase (a fairly watchable Charlie Sheen B-Movie) 

Big advertising agencies aren't invincible, but they're winning because they offer what clients want.

If you want to do things differently, you have to make a strong case that your way is better.

"Better" means creating more sales with your adverts.

Don't roll your eyes. Read that last point again.

If you can't make the case that your way is better, what are you doing here?

You're an ad man.

Sell it to me.

Tuesday, 21 May 2013

Football model: Beating the bookies!

Well that was a lot of fun. Starting out with a subscription to EPL Index statistics, I decided in January this year to have a go at building a predictive model for Premier League football games. Wallpapering Fog's been a record of the development and predictions (all posts here) and has done a great job of keeping me motivated. It would be easy to skip a week's predictions, when you've got a hangover on Saturday morning and real-life stuff to do, but publicly committing to posting has worked. It's also turned up a fascinating community of analysts, who I'd probably never have read otherwise. Isn't Twitter brilliant?

From a personal point of view, it's been a massively satisfying project. If only because I've wanted to build a proper agent-based model for ages, but couldn't figure out something really interesting - with good data available - to throw the technique at.

The good news is, this all means the model will definitely be back next season!

I didn't originally build the model to bet with and had only ever placed a couple of fun bets in my life until this year (you know, Exeter City to score against Man U, that sort of thing. Geddon Ciddy.) But then I lined up some simulations of past games from the model against the bookies, as a test to see how it compared. It won. So naturally, I started betting...

The first post with actual results picks in it, as opposed to just percentage chances, was the 2nd March and then we had every game until the end of the season. Richard Whittall has written an interesting post today about the psychology of forcing yourself to bet and that's exactly what I've been doing. No attempt to identify systematic errors in the bookmakers odds, or to find the best prices in the market, just pick a winner in every game and bet on it.

Here's what happened, with a £10 bet on each game*

*Technically, I wish it had. I actually ran for a few weeks at a quid a game, because I'm a coward.

Overall the model called 55% of results correctly, which is a fraction higher than back-testing suggested and would probably settle to something closer to 52% over more games.

That number at the end's nice though, right? With a £100 bank to start and betting £10 a game, three months later you've still got your original £100, plus £166.50. And plus £100 bonus for opening a new account too, if you've got any sense.

Loads of development to come over the summer. But for now, the drinks are on Wallpapering Fog.

Sunday, 19 May 2013

Football model: Last predictions for 2012/13

Chelsea v Everton - Home win
Liverpool v QPR - Home win
Man City v Norwich - Home win
Newcastle v Arsenal - Away win
Southampton v Stoke - Draw
Swansea v Fulham - Home win
Spurs v Sunderland - Home win
West Brom v Man U - Away win
West Ham v Reading - Home win
Wigan v Villa - Ridiculously close! Home win

Tuesday, 14 May 2013

What's an analyst selling?

When we offer analysis in any field, what are we selling? I mean what do we really offer, right down at the core of it?

There isn't always an easy answer. A great story that makes this point is the one about the advertising consultant who was speaking to Parker pens and asked them what business they were in. What do Parker pens sell?

Pens, they said.

The consultant disagreed. He said Parker were in the gift business and that the gifts happened to be pens, and he was right. Nobody buys an expensive pen for themselves.

If you understand what you're selling, then you can sell it better. You can also future-proof yourself and not go the way of Kodak, who thought they were in the film business and not the 'taking photographs' business, so ignored digital cameras for a long time, despite having invented the first one in 1975. You go bankrupt by not understanding what you really sell.

So back to the original question. What do analysts offer?

Answers... Numbers... In advertising, I could tell you that spending a million pounds on TV will make you two million pounds in extra profit. In football, an analyst could tell you that signing the latest bright young thing from Italy should increase your points total for next season by ten.

That's the visible output from our work, but why does a decision maker want that information? In one word, what is the analyst selling?


In an uncertain world, we sell certainty. Confidence. The ability to know.

Since I've started analysing football, two themes keep bubbling up in conversations among football analysts on Twitter. One is a huge frustration that clubs appear not to be listening to the analytical community and what it thinks it can add to the game. The other is the large role that luck plays in winning football matches.

I think these two are related.

As analysts, we're hugely interested in chance because it sets the limits of what we can know for certain. In a football match, we obviously can't predict the bit of the result that is down to luck. If two teams are incredibly closely matched, who will win? It could come down to inches either way, on just one shot in the 90th minute, that strikes a post.

In a sales model, we often have measures which we're not statistically confident about. The model says that a marketing campaign is probably working and generating extra sales, but it isn't sure.

This is a problem, when what you're selling is certainty. Undermining the certainty that you sell is a Bad Thing.

What do we do? Do we make claims that aren't substantiated? Offer 100% certainty, even though we know it doesn't exist? Some people do. They quickly get found out as charlatans. Let's not do that.

What we can do is focus on what we know and move slowly onwards from there, because if you've done a big piece of work and you still don't know anything with more certainty than the person who commissioned it, then you've wasted your time. Start with the building blocks that you're confident in.

In marketing, analysis has a foothold because we can do this. We can walk into a business, run an analysis, offer some certainty about the world and critically offer a set of decisions that will improve the current situation. "Will improve" in analyst language means "will probably improve" but we tread a fine line.

In offering this certainty in an uncertain world, partly what we do is transfer risk from decision makers, onto ourselves. If you hedge your bets and prevaricate about what is the right decision, all of the uncertainty and all of the risk is still with the decision maker. They're paying you to make some of that risk go away.

I've yet to see very many of these types of conclusions in football analysis; concrete, positive strategies, based on statistics, that will help to win more games. Proving what doesn't work is easy.

We think that around 35% of match results may be down to luck.

We think that sacking your manager doesn't make any difference.

We believe that winning a cup trophy doesn't make you a better manager than the guy you beat in the final.

These are fascinating things and essential to understand, but it's positive conclusions that will make the difference in getting analysis adopted in any field. If you're a football manager, what's the knowledge above worth? A consolation prize as you sit at home, sacked, that you were just unlucky?

We sell certainty. And if we don't have any certainty to sell, we've got nothing.