They Did What?! Data Horror Stories Revealed
We all like a good horror story, especially a data one. It’s like when you pass an accident on the road, you can’t help but slow down to have look, and then you’re thankful it wasn’t you.
Data horror stories are just like that. It gives us comfort to know that we are not the only ones suffering and that there might be people in even worse situations out there. We’re not just going to slow down here though; we’re also going to stop and see what went wrong. That way we can avoid finding ourselves in these situations in the future.
Let’s Start With One That Made the News
Firstly, there was the new £150 million children's hospital in Edinburgh that had to delay opening in 2019 due to a spreadsheet error from 2012. Not only did it halt the opening of the hospital, but it resulted in £16 million worth of remedial action to correct the error.
So, what happened? Well, according to BBC News, The Grant Thornton report stated:
“A spreadsheet called the ‘environmental matrix’ and dated from 2012 contained the ‘four air changes’ error for critical care. This looks to be, based on our review, human error in copying across the four-bedded room generic ventilation criteria into the critical care room detail. None of the independent contractors involved in the matrix picked up on the oversight.”
But it gets better. When the project went out to tender in 2013, one of the bidding companies spotted the error and corrected the spreadsheet when they submitted their bid. They did not win the bid and this was failed to be picked up by the team evaluating the bids.
In addition to this, there was another error spotted within the same spreadsheet in 2016, but this one was not picked up and an independent tester also failed to pick up on these errors.
Where to start with this one?! If we go back to the origin of the error, this is where it is so important to have people with knowledge of the subject either working with or checking the data. Remember that I’ve talked about spotting patterns in data? I have absolutely no knowledge of this area, but I bet if I had looked at the spreadsheet, I would have seen a trend of the same information over two columns and questioned this, especially if the information in all the other columns was also different.
There were also several other opportunities to spot this error, particularly in the bidding process where one of the bidding contractors actually flagged the error. I would suggest that if the bid team were looking at all these spreadsheets thoroughly, getting to know the data, which I have talked about many times in this book, then they would have picked up on this quickly.
Again, it’s down to having the right people working with the data, and keeping it consistent. If you have the same person working on the data, they’ll become familiar and soon spot errors.
And finally, there was another error picked up in that very spreadsheet. In my mind if there’s one error, there could be many. Wouldn’t it have been sensible to check the whole spreadsheet to make sure everything looked right?
Ultimately, it seems like there was no ownership of the spreadsheet. Had this not been the case, it could have been a very different outcome saving millions of pounds.
Stories of the Common Data People
Now let's get down to the real nitty-gritty truth about data, the things companies would never publish because companies wouldn’t want it getting out. All these stories have been donated anonymously by my lovely LinkedIn followers.
“I’ve seen mobile yoga services categorized under telco mobile spend but then it could have been worse, it could have been mobile massage services. I’d have liked to have seen a rate card for that.”
This can easily happen if descriptions are misread, and then maintenance and spot checks are not carried out. In this case, think about context. Look at the supplier name, the description and the value.
If it’s a person’s name, you’d expect a personal mobile phone cost to be around the £50 mark, and you’d also expect there to be other expense charges such as travel associated with that supplier name. If you did happen to look at the description, it’s important to read the whole description. This misclassification could have happened as a result of a keyword search on “mobile,” without the data being thoroughly checked before bulk pasting the classification.
Or it could have been some code or some form of automation. In this case that is why these final checks, spot checks and maintenance are so important. It might be a small amount misclassified now, but what about in six to 12 months?
You Should’ve Asked First
“The company I work for (a huge bank that shall remain nameless) decided to migrate its data warehouse to a new Teradata platform, but doesn’t appear to have involved users in the naming conventions (aren’t intuitive or follow what I’ve seen as best practices in naming field conventions) or verify that all fields relied upon for analysis are accounted for and mapped properly. We had some calls to discuss at a high level the project, but now I can see a big, missed opportunity to engage users. Maybe they thought they are the experts, but if the users can’t understand the new fields and/or find the fields they need to do analysis, things get delayed or short down. This is happening now. It’s very frustrating.”
I think it’s important to get the whole organization involved, and this example just proves my point. How can you build and develop something for a team to use, if you don’t firstly find out how they are working? There could be legitimate reasons why things can’t be done a certain way, but if you don’t ask those people involved, you’ll never know, and you might end up using them in your methodology.
This could cost your company a lot of time and money, especially if you are using external consultants, and some of the best advice you might ever get is from the people who work with the system day to day.
And remember when I spoke of the technology not working, staff losing faith in the system and going back to their old ways? This is exactly the type of situation in which this could happen.
And I haven’t even spoken about COAT yet, having a consistent naming convention is so important so that everyone using the files or documents are clear on what the terms are and what they mean. Without this, you won’t have organization, accuracy or trust in the data.
Want more dirty data horror stories? Get yourself a copy of Susan’s book ‘Between the Spreadsheets: Classifying and Fixing Dirty Data’. On sale now and available on Amazon, Waterstones and Barnes & Noble.