Written with Jonathan Adamson @ordinaryjon
You know the bit of your job you hate? You know, the data collection and data wrangling bit. That’s the good bit too.
Here are some moans I hear from fellow analysts, researchers and data chauffeurs.
“If we had better data we could produce better viz.”
“If I didn’t have to spend so long sorting the data out we could produce better viz.”
“The data was nowhere near as good or complete as the client led me to believe. If the data had been anything like what they said it was at the start… we could have produced better viz.”
I say fellow analysts, but actually it’s me too. I’ve said all these things at some point.
You know that when you start to work on a new project, you will have that conversation. The conversation about what insight the client wants to glean from the data.
Well before you can even show insight from the data, you start with ‘where is the data?’ Even today, business-critical data doesn’t always exist in any usable format. So my starting point on any project now is: ‘the data doesn’t exist until it’s on my computer’.
Then the second, bigger issue is data quality. It’s always, always the main issue. For years, I hated cleaning and quality assuring data. I hated having to reformat data for different analysis and presentation.
Now there are of course tools like Alteryx to do this, and it’s getting easier. But for many data cleaning is still tiresome, requires concentration and feels unrewarding. No one ever says thank you for sorting out their data.
If after all your hard work data-cleaning and sorting there was an error in the data. That is what people wanted to focus on. Sometimes, it seems the only thing others can focus on. They would be quick to point out any errors. You could have a million records and one erroneous piece of data means that’s all they can see.
At one meeting our concern about the reaction to the quality of the data meant we were determined not to show it. We thought it would undermine the viz work we had done. Yet the client was insistent on seeing an early version of the viz. Of course, they understood that the data needed checking. They would look beyond that to consider the viz that we had produced. Reluctantly we showed the work.
And the first thing they said, almost in unison, with the familiar blend of sarcasm and contempt;
“Those figures aren’t correct.”
There’s often the suspicion that people think you’ve sabotaged their data. They look through narrowed eyes at these snake-oil sale men in front of them with their fancy viz. You search for a polite way of saying, ‘well it’s your data, it’s what’s in the database, we didn’t make it up!’
At this point, I used to think we had failed. We had set out to produce some analysis and data viz for a client and they didn’t want to use it. Their data was wrong and I had shown this up to everyone in wide-screen HD technicolour.
I was wrong, as a funny thing starts to happen. People start fixing the problems. They start sharing hypotheses of why the data is wrong. They come up with solutions for putting it right. Generally, these two things – reasons and solutions – come to them fast. They kind-of knew them already.
The reasons for data quality problems are many, but generally come down to two things: the processes in place and the people who follow them. Sometimes poor data quality is a result of a process issue with the data collection. The means of data capture works against people. For example, capturing data in one format – e.g. via websites, paper forms or through interviews – and then having to re-enter it in another system somewhere else.
Yet sometimes the process issue is the fall guy. What we stumble across is a people issue. Because the system is fragmented, people in it don’t see the end-to-end process. They don’t see how the data is a catalyst for action and they don’t see how, or why, they could or should, improve it.
By showing them their own data in a visual way, they now see it in all its flawed beauty. Yet most of all, they see the potential for it and this gives them the motivation to fix it.
This is not only a part of the process; it is the most important part.
The beautiful representation of colours that you finish up with is the tip of the iceberg. It is the 25%. It’s the Gary Lineker of data viz; the goal-hanger, tapping in over the line and wheeling away for the glory.
The other 75% of the iceberg, the other 10 players tackling and passing and busting their lungs to create the simple tap in; that is also your job. That is what you do.
It may not be the coolest thing about it, but data viz is about improving data quality. It is about helping people to understand why they even collect data, and what they should use it for. It is, to coin a phrase, more than pixels on a screen.