Tuesday, 15 July 2014

The quiet BI revolution (part one)

Three years ago on Wallpapering Fog, I wrote a post about why our company (or more precisely, since the company's huge, my department) had adopted Tableau software.

At the time, I said:

"I feel like I'm giving away a trade secret here, but what the hell, you're going to hear about it from somewhere soon anyway."

Having just attended the London Tableau Conference, I can confirm that the secret is well and truly out. It was a brilliant event, brimming with enthusiastic people and great ideas, that deserves its own write-up away from this post.

For this post, I'd like to indulge in one of my occasional crystal ball gazes and look at the future of Business Intelligence (BI). Not BI on the cutting edge - although that is an exciting topic - but BI in regular businesses. Businesses that have small analytics teams, no time and aren't PR'ing a project to the trade press, with all of the doubts and the dirty laundry Tippexed out.

So where is BI - and in particular, regular reporting - for a normal analytics team going to head over the next five to ten years?


1. Data Visualisation and Reporting

Data vis as it applies to most businesses, is now a solved problem (what to visualise isn't. That's part two of this post). You can have good looking reports, automatically refreshed and delivered onto any device you like and even on paper, if you must. They're quick to build, easy to adapt and easy to maintain - more so than Excel-based reports ever were and much more flexible.



The only things you can't do easily, are weird and wonderful innovative visuals that nobody's ever seen before and you can't have all of this functionality for free.

On the first of these problems, I'd argue that this isn't a business issue. Businesses need straightforward charts, tables and standard reports, not animated 3D network diagrams, so software like Tableau will do a great job. I'd also argue that if you're looking for real flexibility, Lyra is something that I'm quite excited about...

On the second problem - cost - you just have to bite the bullet. $20,000 spent on the right BI software will transform your analytics department.

(That's if you give the $20k to your analytics department. DO NOT give it to a centralised IT team. They'll very likely ask for another $230k to make a nice round number, disappear for six months and then reappear asking for more money.)

The real change in data reporting, investigation and visualisation over the next five years or so, is going to be from a situation where many businesses don't yet realise that it's a solved problem, to one where they do.

Tableau's solved this problem and in my opinion is by some distance the best of the new breed of reporting and investigation tools, but if it hadn't been Tableau it would have been Qlik View. And if not them, Spotfire. And... you get the point.

What's going to happen over the next few years is that Tableau knowledge will become more valuable - because more businesses will want to hire those skills - and also less valuable, because loads more people are going to know how to use the software. The end result is basic supply and demand. It might swing back and forth for a bit, but we'll settle onto a situation where many (most?) analysts know Tableau as a regular part of their job. There'll be specialists, just like there are specialist Excel consultants, but most businesses will sort themselves out and nobody will be paid a fortune just for knowing how to use Tableau.


So far, no real surprises and if you read Wallpapering Fog regularly then you've probably heard those ideas before. The next two points are where I see a quiet revolution happening.


2. (not) Data Warehousing

You probably already know how this works. Analysts with Tableau do the visuals, but there's a big SQL database in the back end, looked after by a centralised IT team, which contains exactly 73% of what you want to visualise. A big enough gap that you can't just ignore data that isn't in the data warehouse, but not so big that the data warehouse as it stands is useless.

What often happens in response to an incomplete data warehouse, is that analysts build a hack. The data that isn't centralised is pulled in from ad-hoc spreadsheets and mashed together in Excel or Tableau, which works OK until you need more than a couple of people to update those spreadsheets, or somebody's on holiday. This is the issue we often hit in media agencies; you can solve a problem once, but can't roll out the solution everywhere to all clients because some parts of your 'solution' are held together with gaffer tape and bits of string.

What's needed is some software that's built for analysts and allows them to merge different data sources and to schedule updates, without recourse to a database administrator.

If you were at the Tableau Conference last week, then you'll have seen Alteryx sat squarely in this area. Drag-and-drop, hugely flexible and very friendly, I played with the demo a few months ago and I loved it.

But, it is quite pricey. Especially if, like us, you wouldn't plan on using all of Alteryx's capabilities and are only really interested in blending data sources together.

Did somebody say what about Open Source? Here's my tip of the day. Go and download the Community Edition of Pentaho Kettle and persevere through the thirty minute skirmish it will take you to get it installed and working properly. Your reward will be drag and drop data acquisition, blending and output, all for free. This is how I process a lot of my football data and it's brilliant.



In terms of crystal ball gazing, the analytics department now starts to look quite different. It's running a lot of reports on schedules, freeing up time for investigation and innovation. Nobody does the whole "getting into work at 7am on Monday for a frantic three hours of board report running" any more, which retailers in particular are very fond of. And thank God for that.

In our new world, IT only handles data when it needs to flow in large volumes from a point-of-sale or distribution system. IT does the bit that it already does very well now, but everybody stops moaning that the data warehouse doesn't also contain lots of the smaller user-maintained pieces of information that make a business run properly.

If you're thinking that the new world sounds like the same old BI promises, then you're right, it does. We should have been able to do these things ages ago but it didn't work due to the disconnect between analysts and IT and the slow build time, inflexibility and high cost of software. Analysts received questions and understood what output was needed, but usually only IT had the (inflexible) technology to make that output happen automatically.

The big differences now are speed, cost, flexibility and the number of companies that will be working in this new way. It's no exaggeration to say that you're able to go from raw data, to first-version business reports in two days. You can pin those down to a format everybody's happy with in a couple of months (faster if you make decisions quickly) and then you can fully automate them. Reports are able to evolve because they can be rebuilt and republished very quickly, in hours rather than weeks.

Then what do you do next? It's a serious question with which some reporting teams are going to struggle. When nobody needs you to move data from Google Analytics to Excel and chart the same charts every week, what will you do? The time to start thinking about that is now.


3. Data acquisition

This one's not solved; it's currently being solved and we've got a little way to go yet. Data acquisition is the last barrier between analysts, managers and an automated dashboard containing absolutely everything on which they wish to report.

Alteryx and Pentaho Kettle are fantastic data assembly (ETL) tools, provided your data isn't stored somewhere really stupid. Unfortunately, I work in marketing and our industry specialises in making data as difficult as possible to access.

- It's in untidy, bespoke web interfaces, behind login screens.

- It's in the colour key that somebody has chosen to fill cells in Excel

- It's emailed across, with a friendly "Hello! Hope you had a good weekend. Today's spend number is £2,486."


Database that, smartarse.


What I see happening over the next few years is some new tools and some new ways of working. Provided data is delivered in a consistent format, then the likes of Alteryx or Kettle can make the data acquisition and blending problem go away.

Where data is in web interfaces, we can already scrape it using Python or R, but then you need an analyst who knows how to scrape and that's not such a common skill-set. (Top tip: look for a football analyst - by necessity, we're getting quite good at it.)

We're going to evolve towards XML and other data feeds in addition to the usual user facing tables that come from the majority of web data sources, which again brings the likes of Alteryx into play. The data providers who don't do this should gradually become extinct through a process of natural selection.

Eventually, these changes will form an almost universal API. Every provider's data is different, but you'll be able to get to the data in an automated way and that's 90% of the battle. When you've done that, you only need to solve the data transformation problem once.

We'll also see - as is happening already - advanced data providers like Datasift starting to deliver information into services such as Google's Cloud Platform. A few years ago this wouldn't have helped, because you're just swapping one API for another, but when a critical mass of services all use that same cloud, easy connectors start to appear.

So why do I say that data acquisition isn't a solved problem yet?

Well for one, too many sources are still silos, but a second issue is that user input is still much too difficult. There's no Tableau for manual data entry and we still have to call a developer to create web forms and database schemas and data validation and to link it all together for us. Either that, or we have a central spreadsheet for people to fill in and we pray that they don't break it, or all try to edit it simultaneously.

I'm sure this software will come, but I haven't yet seen it. Microsoft Access forms and VBA really isn't it and neither are Google Forms. Microsoft, for all that they had a massive head start and will claim to have solutions to all of these problems, are nowhere in the BI race and are falling further behind.

If you've seen another solution to the problem of regularly taking validated user input without embarking on a software build or trying to lock down a spreadsheet, I'd love to hear about it in the comments.


The future's bright

In our future analytics department a lot has changed, but it's been a quiet revolution. A lot of things that were difficult are now easy and the business analyst's scope has extended well into traditional IT territory. Or, more accurately, that territory is more clearly delineated between the two departments and issues which neither IT nor analysts could previously solve (for a sensible budget in a sensible time-frame), have been dealt with.

Reports have moved to web browser interfaces - except for those staff who absolutely insist that they need printed ones - and automation takes care of putting them together. Analysts can quickly and visually interrogate their data and as an aside, Excel has moved to being a secondary tool for serious analysts, behind Tableau (or a competitor of your choice).

We were promised all of this a long, long time ago. Most businesses might actually get there in the next five years or so. It's interesting that the process of assembling Business Intelligence is being solved backwards... Rather than from data collection, to merge, to visualise, solving the visualisation element has driven a requirement to be able to better blend data, which in turn drives changes in how we acquire it.

And you know what happens after that? Businesses will start to realise that a lot of the information they've spent years trying expensively to assemble, won't on its own work the miracles that they hoped it would. Not without some other major changes happening too.

My favourite quote from last week's conference came from Fawad Qureshi of Teradata.

"Old business process + expensive new technology = expensive old business process"

That will be part two of this post. When you've got to your ultimate suite of business reports and they're easy to maintain, what happens then? What changes? Does anything happen at all?

7 comments:

Joel Stellner said...

Great Post Neil,

Would you mind if i shared the link around on my linkedin?

- Joel @ Tableau

Neil Charles said...

Glad you enjoyed it and always grateful for the extra traffic! Please feel free.

Hell said...

very nice post. I have one problem maybe u may help. I am trying to install pentaho kettle ce onto a windows 8 machine but I am not able to find installation wizard file. In every support document it says "download pdi-5.0.0 bla bla .exe " but there isnt any donwload link! Googled it many times without success.

cheers .

Neil Charles said...

You don't need to install Kettle, just download this and unpack it into a folder (link at the bottom of the page)
http://community.pentaho.com/projects/data-integration/

Then run "Spoon.bat" to start it.

You might find you have trouble with Java when you try to run it for the first time - most people do. This should help.
http://jira.pentaho.com/browse/PDI-4281

Hell said...

thanks for the quick comment. I'll try it and let you know.

Hell said...

It didnt worked at the beginning! Showed "javaw.exe not found" err message. I edited bat file and set "set PENTAHO_JAVA=C:\Program Files (x86)\Java\jre7\bin\javaw.exe" than it worked! Clearly a path problem.

It will be my first trial, do you think training videos are good enough on pentaho site?
What I need is basically connecting erp data (running on ms sql) and etl some of them to datawarehouse, which will be a base for my tabular modelling. We are using SQL Integration service for the moment but it's not easy for a rookie like me.

thanks.

Neil Charles said...

The documentation is ok, but not great. Not sure about the videos, I haven't tried them.

Best thing is to try working it out and then Google for things you can't make work. Basics are pretty easy but it's not always intuitive when you want to transform data.

Good luck!