Monday, 9 February 2015

How to do football analysis in Tableau | Part 3

If you've been working through these posts, so far you've copied some football data from a website into Excel, cleaned it up and learned to build simple tables and charts with that data on a Tableau worksheet.

In Part 3, we're going to create our first interactive dashboard.  We'll make a dashboard that lets you select a team and then it will show you which are that team's strongest players.

If you had fun messing about with Tableau in Part 2 and learning by building loads of different views (I really hope you did. Screw instruction manuals. Including this one), then you might want to start a brand new Tableau workbook at this stage and connect to your Excel data again. Have a look at the early bits of Part 2 if you can't remember how that worked.

Make a new worksheet and drag Mins, PS%, Assists and Goals into the view, then split the rows by Player Name and Position. You're looking for a view like this.

PS% is a downright unhelpful name for Pass Completion %, so right click it in the Measures Area (bottom left of the screen, remember?) and rename it. Tableau will then automatically change the name wherever else you've already used it.

It might be good at this point to also change Pass Completion from Sum to Average. It doesn't really matter, because you've told Tableau to split the data by player and there's only one row of raw data for each player, so Sum and Average are the same. If you'd split by team instead though, and then summed pass completion, the data wouldn't make any sense because Tableau would add all of each team's players' pass completion rates together.

Right click the little green lozenge for Pass Completion % and change it from sum to average. If you really want to see the difference that makes, take the Player Name split out of the view for a moment and try Sum and then Average.

It always helps when you're using Tableau, to keep in mind what your underlying dataset looks like, otherwise sum and average (and as you get more advanced, weighted averages) can get you into trouble.

Back to the table... At the moment, our table has got players from both teams in it and we don't want that, because we're building a dashboard that will show us one team at a time.

Drag "Team" from the Dimensions area and drop it in "Filters".

You'll get a pop up showing you Liverpool and Everton. Tick Liverpool and click OK.

Now our table only has Liverpool players in it. Name the worksheet "Player List" (double click the worksheet's tab at the bottom of the screen) and then we're done with this one for the moment.

Add a new, blank worksheet and we'll also make a chart for our dashboard.

Let's try for a chart that will show us who's played a lot of minutes and who's playing well.

Drop "Rating" from the Measures area onto the Columns shelf and "Pass Completion %" onto the Rows shelf.

Tableau will guess that you're trying to draw a scatter chart, because you've dropped Measures where you'd normally drop Dimensions.

What we've got isn't tremendously useful though. It's a scatter, but with all of the players summed into a single point and we want them split out.

Drop "Player Name" onto Detail in the Marks area and Tableau will give you a point for each player.

Detail is essential for scatter plots and for maps. It tells Tableau at what level you want to see your data split out.

This still isn't a great chart though. Nobody has really low passing percentages or a really low rating, so let's tell Tableau that we don't need to see zero on the axes. Right click each axis, choose "Edit Axis", untick "Include Zero" in the top right corner, and press OK.

We're getting there. The chart shows who has good pass completion and who gets a good rating on WhoScored. If you hover the mouse over a point, it will tell you which player it is and whoever is up in the top right hand corner has good passing and a good WhoScored rating.

We can do better than that though. Grab "Mins" from your Measures area and drop it on Size.

Now the chart is showing who's played well (ish. This isn't very advanced analysis!) and whether they've had lots of minutes this season. Larger blobs means more minutes.

Drop "Team" from Dimensions onto Color and you'll be able to See Liverpool and Everton separated out. It would be nice if Liverpool were in red though...

Double click on a colour in the legend and you can change it.

(Worried about colour blind people? You could drop "Team" onto Shape as well. Don't leave it there though, because it will look a bit rubbish and we're only going to show one team at a time in a minute.)

Finally, drop "Player Name" onto Label and suddenly you'll be able to see who's who.

Rename your worksheet to "Player Scatter". We've got two worksheets and can build a dashboard!

The idea of dashboards is that you can drop multiple worksheets onto them and link them together. It lets you build screens that contain a mix of tables, charts and maps, where each piece of the dashboard is coming from one worksheet.

Make a new dashboard screen, by clicking the button next to your sheet tabs at the bottom of the screen.

In the new dashboard screen that pops up, you should see all of your worksheets appear in a list in the top left. Grab "Player List" (you did rename your sheets, didn't you?) and drop it onto the large white dashboard space.

The table you built earlier appears.

Now grab "Player Scatter" and do the same thing. Keep the left mouse button held down and you can see what will happen if you drop the worksheet into different places - at the top, on the left, or on the right. We want it on the right.

You should get something like this. Tableau put the chart on the screen and also the legend for your chart in separate boxes on the right hand side that you can move if you like. It looks rubbish. We can fix that.

Depending on the resolution of your monitor, Tableau has drawn you a default dashboard screen size and shape, but it's unlikely to be exactly what you want.

As a little aside, you can tell Tableau to draw a fixed dashboard size - that will never change no matter who loads it - or one that will try to adapt to different users' monitors by resizing itself. In my opinion, dashboards that resize themselves are usually crap and to be avoided. Tableau's not good enough at reshaping each element on the screen and if somebody loads your lovingly constructed views on a low res monitor, they'll look awful. You can set limits to the resizing, but it's much better to just fix the size and make users with low resolution screens scroll around a bit.

Use the dashboard size options in the bottom left to make a screen that is Exactly ('Exactly' is in the dropdown box) 1150 x 600.

That gives you a nice widescreen shape that will fit on a decent resolution laptop monitor. If you find it's half off your screen and there are scroll bars, you can make it smaller if you want.

Now that you can see both of your dashboard elements properly, it's time to make this dashboard interactive.

Remember adding the Team filter to your table so that it only showed Liverpool? We want to show that filter on this dashboard screen.

If you click in the area occupied by "Player List", you'll see a grey box appear around it. In the top-right corner of that grey boundary box, there's a tiny little down arrow, next to the x. It's really small but it is there and it's one of the things brand new users tend to miss.

Click that little down arrow, find "Quick Filters" and click "Team".

The filter that you created earlier will be pulled onto the dashboard, with only Liverpool ticked. If you hadn't already created that filter, Tableau would have just made it for you and also automatically put it on the underlying worksheet.

Tableau's default filter for a text column like Team, is a list of options with a tick box next to each, so in this case you get an option for Everton and Liverpool and (All) at the top. Sometimes this is what you want, but we'd like this dashboard to just show one team at a time, with no option to select both.

Filters have options on a little down arrow, just like the one we used on the "Player List" screen. Click in the area that your filter is occupying so that its grey boundary box is shown and use the menu to select "Single Value (Dropdown)".

That's given us a tidier dropdown box that will always take up the same amount of space on the dashboard if later on we decide to add more teams to our data.

You can also get rid of the (All) option on that box, so that people are forced to choose a team. In the same menu you just used, go to "Customize" and untick "Show All Value".

Our menu is ready!

Well, nearly. Change it from Liverpool to Everton and watch what happens. The table will switch from Liverpool to Everton, but the chart stays the same. That's no good. Currently, the filter is only linked to our table, not to our chart.

Use the menu on the filter (tiny little down arrow again) one last time to select "Apply to Worksheets" and "Selected Worksheets".

Then tick "Player Scatter" in the box that pops up and click OK.

Now try changing the filter. You've got an interactive dashboard!

When you add filters, you can choose to apply them to just one worksheet, to a selection of worksheets, or to everything that uses this data. The key thing to bear in mind is to make it obvious to users how your dashboard works. Filters that randomly change some things but not others are really, really confusing.

As far as this quick introduction goes, we're done! You've acquired data, loaded it, drawn tables and charts, built a dashboard and made it interactive and that's quite enough for today.

The only thing that remains is to share your masterpiece with the world.

In the File menu at the very top of the screen, click "Save to Web".

You'll get a popup box asking for your Tableau Public login. Remember your Tableau Public login? You made it in Part 2.

Give your workbook a name and hit Save. After a few moments, Tableau will pop up your workbook in all its glory, on the web. You can share it with anybody from here and they'll have exactly the same interactivity through filters, as you did in the desktop software.

Here's my version.

Helpfully, just as I wrote this guide last weekend, Tableau completely overhauled the look and feel of the old Tableau Public site. Thanks guys. Suffice to say, there are share and download links on the web dashboard screen and you can also access your profile and any other dashboards you've uploaded. Have a browse around the site and all will become clear.

Tableau really is a fantastic tool and you can achieve amazing visualisations, much faster, than with anything else I've come across (and I reviewed quite a few). I love it and I hope these three posts will help to remove that initial trepidation for a few football analysts. Get stuck in, connect to data and start playing. You'll get it and if you don't, Google it. And if you still don't, feel free to tweet me @neilcharles_uk.

How to do football analysis in Tableau | Part 2

"Here's where the fun begins"
Han Solo

So you've read Part 1, you've downloaded and installed Tableau Public and you've got a small Liverpool and Everton dataset to play with. Or alternatively, you took the short cut, downloaded the dataset and came straight here.

Either way, it's time to hit Tableau.

Before you dive into the software, you'll need to create an online Tableau Public account. This will give you somewhere to publish your visualisations and also to save them while you work on them. One of the big restrictions of Tableau Public is that it's cloud-based and you can only save your stuff to Tableau Public's website

You've set up an account? Great, we'll make use of it later on. Now you can open up the Tableau Public software.

You should be looking at a screen like this...

Click "Open Data"and you get a long list of data sources that Tableau can access. The greyed out ones are for Professional users only - they aren't available in Tableau Public - but our dataset is in Excel, which is fine.

Click Excel, find your football data workbook and open it.

You should see Tableau's data loading screen.

You can do a lot of data manipulation here, including joining different datasets together, but we're just loading up a single Excel worksheet for now. If your Excel workbook has only got one worksheet in it, then Tableau will pick that up automatically. If you have more than one sheet in the workbook, then drag the one with the player data across from the left hand column to the empty top window.

Click "Go to Worksheet".

We're in! You should be looking at an empty Tableau worksheet.

If you can use Excel pivot tables then you're going to feel at home here quite quickly but if you can't then don't worry, it's all very straightforward.

Tableau's looked at our dataset and guessed which columns in our data are "Dimensions" and which are "Measures".

Dimensions are things you can split the data by. Player names, team names and positions go in here. "Apps" (number of appearances) shouldn't really be in here but we can deal with it later if we need that data. Tableau's guessed wrong because there are brackets for substitute appearances in the Apps column on our spreadsheet, rather than it just containing simple numbers.

Everything that isn't a Dimension, is a Measure. Measures are data columns that you want to add up, or average, or do whatever else mathematical with. Measures are your numbers. Tableau's put things like Age and Goals and Minuted Played in here, which is what we want.

So, what do you want to see first?

At last week's OptaPro Forum, Simon Gleave (@SimonGleave) showed some nice age distribution charts that plot the ages of players at a club. We could easily draw one of those.

Drag "Age" from the Measures area and drop it into the middle of the table, where it says "Drop field here". Take care to drop it in the middle, not onto the column or row headings areas.

If you're using the sample data, then it will now say 1,330 in the table. If you've put your own data together then you might get a different number because the WhoScored website is regularly updated with new player data.

1,330 is the total of all of the Everton and Liverpool players' ages in our dataset. Useful.

What about the average age of each team? That would be more useful.

Since you dragged in Age, there's now a green lozenge on the "Marks" area that says "SUM(Age)"

You can use this green lozenge to change from Sum to Average. Right click it, find "Measure (Sum)" in the popup and change it to Average,

In the sample dataset, the average age of all of Liverpool's and Everton's players is 26.6.

Let's split it by team. Drag "Team" from Dimensions onto the row shelf at the top of the screen and drop it.

Everton are older than Liverpool! We've just learned something.

Showing numbers to three decimal places is a bit much, so you can change the default formatting for Age if you want. Right click it in the Measures area, choose "Default Properties" and use "Number Format" - "Number (Custom)" to get rid of the decimals.

You could format numbers directly in the view but the nice thing about changing the default is that now whenever we use Age again, it will always appear without decimal places.

Stop reading for a minute and have a play with this table. It might seem odd to say it in a user guide, but the best way by miles to learn Tableau is to play with it. Drop more measures into the view and try splitting the rows and columns by different dimensions and see how Tableau reacts.

That's enough tables. Tableau's all about the graphics, no? Let's draw a chart.

Use the little tab button at the bottom of the screen - the one that looks like a little bar chart - to create a new empty worksheet. The other button - that looks like a little four pane window - is for creating dashboards. We'll get to that later.

Hopefully you're starting to see that in some ways, Tableau's a lot like Excel. It has worksheets and each worksheet is basically an Excel Pivot Table, with rows and columns and measures.

We're going to use this new sheet to draw an age distribution. That means we'll want to count how many players there are, split by age groups.

In our dataset, Age is a Measure to be summed or averaged, not a Dimension that you can split things by, but Tableau can sort that out for us. Drop age into the middle of the view, like you did before and then use "Show Me" to draw a histogram.

We've got a chart!

The Show Me button is the centre of Tableau and it's where you decide what kind of visualisation you want to draw. Think of it like the Excel Chart Gallery, but a lot more powerful.

Depending on what data you've dragged into the view, Tableau will offer you different types of charts in Show Me. This can sometimes get a bit confusing, e.g. you might decide to draw a scatter chart and Tableau says No and greys out the button. It looks at the data you've dragged into the main view and decides you can't draw a scatter with that.

When the type of chart you want is greyed out, look at the tip at the bottom of the "Show Me" box. Tableau will tell you exactly what it needs and when you drag those things in, the option you want will work. Once you start to get used to Tableau and the way that it works, you'll find this happens much less.

Charts in Tableau work exactly like tables and if you get confused, it can often help to think of them that way, or even switch back to a table, sort your data out and then switch back to a chart.

Charts working like tables, means that they have rows and columns and we can split our age chart by team if we want to.

Grab team from the Dimensions area and drop it just to the left of CNT(Age) on the Rows shelf.

Two charts! Now we can really see the differences in age that are driving Everton's older average.

Try dragging Team back off the Rows shelf and putting it in different places - on the Columns shelf, or into Colour or Label in the Marks area. There is loads of flexibility to create the view that you want.

Tableau has five basic ways of showing you differences in your data.

1. You can split it, using the Rows and Columns shelves.
When you do this, you'll get new rows and columns in a table, or new charts, one for each split.

2. You can vary colour

3. You can vary size
(That one doesn't really make sense as a team split - try it. Not every technique is good everywhere)

4. You can change the shape of datapoints

5. You can label different items

Tried all of those? Don't just skim through, I really meant it about playing being the best way to learn. Drag player names in. Swap Age for a different measure. Put team in Color and Label. Go nuts. Break stuff. Junk your worksheet and start again if you need to and remember that Tableau's Undo feature is pretty much bulletproof; Control-Z will always put things back the way they were!

Now that you know the basics of worksheets, you're ready to make your first dashboard...

We'll create some more charts and use them to build an interactive dashboard in Part 3.

How to do football analysis in Tableau | Part 1

I'm back from a hugely enjoyable OptaPro Forum last week and buzzing with new ideas. It was a fantastic day with some great presentations and it was brilliant to finally put faces to a few analysts' names (and Twitter handles!)

One question came up often enough at the forum to make me think that there might be an audience for this post. How do you use Tableau for football analysis...?

Want to know a secret? Tableau's easy. And it's free. There's really no need to restrict yourself to the purgatory of Excel's chart gallery. This post will cover a few basics and then if there's interest, I might do a follow up that shows how to do some more advanced visuals with X,Y pitch coordinate data.

I'm going to keep this post  (actually, since the length blew up as it was being written, these three posts) at a pretty high level: Where to get data, how to connect to it, how to draw some basic views and then make them interactive.

The guide will assume that you can handle yourself with the basics of copying and pasting and inserting columns in Excel and that "install this software; here's a link" isn't too taxing! In general, I'll try to take things step-by-step, with screenshots so you can see where we're going.

First things first, we're going to need a copy of Tableau. It comes in professional and free versions and unless you've got a grand to spare, you're going to want the free "Public" version. Tableau Public is almost fully featured, with just a few restrictions on the types of data you can load and how you save and export your dashboards (more on this later).

If you haven't already, download and install the latest version of Tableau Public from here.

Next, we'll need some data. Very often acquiring data and cleaning it up is the biggest barrier to football analysis and this post isn't going to cover the various sources you could use. Once you get into very large volumes of data, you start to need programming skills in a tool like Python or R and this is only supposed to be a basic intro! We need a small, simple dataset and WhoScored can help us out.

Let's build some visuals to compare Liverpool and Everton this season.

TL:DR. If you don't care about this bit, you can download the mini dataset that we'll use in Tableau and go straight to Part 2.

Go to the Liverpool page on WhoScored and you'll find a table that looks like this.

It's a nice little dataset with player statistics for this season, but it's not doing us any good stuck on the website - we need to extract that data.

There are a few automated tools that are worth trying for data extraction. is new and is shaping up to be a really great tool, but unfortunately it doesn't pick up the data that we're after when it scans WhoScored's web page. Excel's load data from web feature doesn't either.

We're going to have to do this the old fashioned way.

Click and drag to select everything from the little 'R' in the black title bar at the top of the table, to the bottom right hand corner of the data. Make sure you've got exactly that selected and then Control-C to copy.

(There are probably programmers yelling at their monitors right now. I'm not covering web scraping in this post!)

In Excel, hit paste (Control-V) in cell A1 and you get...

Now go and get the Everton data and do exactly the same thing, pasting it onto a different sheet.

Finally, copy the Everton data from Excel excluding the black titles row and paste it just below the Liverpool data. If you don't skip out the titles row, you'll get column titles mixed up with your player data in Tableau - we only need titles at the top of our table!

As a general rule, Tableau likes data to be in lists. You want your data to be listed downwards, not across the page as loads of columns. WhoScored's data is a list of players with facts about them and that's ideal - a separate set of columns for every player, or data scattered across different sheets won't work.

You can get rid of the Everton data sheet now, you don't need it any more. Give the worksheet with both teams on it a sensible name and save your workbook.

Now we're going to clean the data up a bit. We don't need all of the formatting, so highlight everything, find 'Clear' on the Home Menu (it's got a little pink eraser icon) and choose Clear Formats.

Simple black and white text. Proper analytics.

You could load your list of data into Tableau straight away, but it's still a little messy. Column B has got nothing in it, so highlight that whole column and delete it (Delete icon on the Home menu). The remaining data will move in from the right to fill the gap.

Those "-" signs indicating no data also aren't great for Tableau. Text saying "-" isn't no data, it's a cell with "-" in it. Find and replace "-" with nothing (Control-H) to get rid of them.

The last job in cleaning up is to sort out our player names. If you look in cell B2, you'll see that player names are currently mixed up with their ages and positions. The technical term for this is, "a pain in the arse".

Insert some columns to create a bit of space, starting from column C. Five columns should do it.

Now your data looks like this...

Highlight everything in column B and choose Text to Columns from the Data menu.

Choose "Delimited", tick "Comma" and click finish.

The positions data gets split into column C and because some players can play in a few positions, you'll get data in column D and maybe column E too. For this little project, we don't care about the extra positions, so delete columns D and E. Don't just clear the data out of them, delete them so that everything moves left and your data looks like this.

The playing positions are sorted, but our player's names are still mixed up with their ages. We can split these with a quick text formula.

Age is always a two digit number at the end of the name. Assuming no players are over 99 or under 10 years old, putting this formula in cell D2 will grab the player's age.

"Right" gets the two characters at the end of the text string and "Value" tells Excel to see the result as a number, not as text (e.g. 20, not "20").

The player's name is everything except the last two characters. Put this in cell E2 to grab that bit.

Copy and paste those two new formulas downwards, to fill up the table with names and ages.

You could have waited until Tableau to make these splits as it has exactly the same LEFT, RIGHT and LEN formulas as Excel, but I want you to be able to see the raw data table before we load it. Once you're in Tableau, the data table sits in the background and as a beginner it can sometimes be a little tricky to visualise what your formulas have actually done.

Nearly finished, I promise. If you look at your table of data, you'll notice that there's not a way to know which team each player is on. We've got a record at the top for Raheem Sterling, but it doesn't say "Liverpool" anywhere. That will limit us in Tableau because we won't be able to easily compare teams if that information isn't in our dataset.

The data in column A, labelled "R" isn't useful for anything, so let's put team names there instead. Select cells in column A down to the last Liverpool player, type "Liverpool" and hit Control-Enter to fill all of those cells. Then do the same for Everton and re-title column A as "Team".

The very last job is to add some column titles for the age, name and position columns you've created. Save your workbook and then if it looks like this, we're ready to hit Tableau in Part 2.