The Dangers of Dirty Data and How to Ensure Your Data Has Its COAT on

Posted: 10/23/2020 - 02:11
Data accuracy is an investment, not a cost.

We all think we know what dirty data is, but it can mean very different things to different people. At it’s most basic level, dirty data is anything that’s incorrect.

Within procurement, it could be misspelt vendors, incorrect invoice descriptions, missing product codes, a lack of standard units of measure (e.g., ltr, l, litres), currency issues, duplicate invoices or incorrect/partially classified data.

Dirty data can affect the whole organization. We each have an impact on, and responsibility for, the data we work with. Accurate data is everyone’s responsibility. However, across many organizations data is the sole responsibility of a person or department, and everyone trusts them to make sure the data is accurate.

How many times have you been working with a data set and noticed a small error but not said anything or manually corrected something from an automated report, just to get it out the door on time? These small errors can filter all the way up to the top of an organization through reports and dashboards where critical decisions are being made.

How Does This Affect My Organization?

One of the most widespread and noticeable impacts is around reporting and analytics. If you’re in senior management, you will most likely receive a dashboard from your team that is used to review cost savings, supplier negotiations, rationalization, forecasting or budgets.

What if within that dashboard was £25k of cleaning spend under IBM? I can already hear you saying, “That’s ridiculous.” Well, it is obvious when pointed out, but I have seen it with my own eyes. It can happen easily and occurs more frequently than you might think.

When there are tens or hundreds of thousands of rows of data, errors will occur multiple times across many suppliers. For the wider organization, this could affect demand, planning, sales, marketing and financial decisions.

Think back to the IBM example. Each quarter the data is refreshed automatically with the cleaning classification that £25k becomes £50k, then £75k the following quarter. It’s only when the value becomes significant that someone notices the issue. By this stage, how many decisions have been based on this incorrect information?

How Do I Fix It?

There’s no magic bullet or miracle solution out there to improve the accuracy of your data. You have to use your team or an experienced professional to get the job done. Get your team to familiarize themselves with the data. If they are reviewing and maintaining it regularly, they will soon be able to spot errors in the data quickly and efficiently.

Your data should always have its COAT on and be:

Consistent: Everyone working to the same standards

Organized: Categorized properly

Accurate: Correct

Trustworthy: You wouldn't drive around in a car without a regular inspection, would you?

How Do I Get a Data COAT?

With a spreadsheet of spend transactions over a period of time, such as 12 to 24 months, the first step should be supplier normalization. This is where a new column is added to consolidate several versions of the same company to obtain a true picture of spend with that supplier. For example, I.B.M, IBM Ltd and I.B.M. would all be normalized to IBM.

Data can be classified using minimum information, such as supplier name, invoice/PO line description and value. To acquire more from the data, other factors can then be added, such as unit price. Where unit price information is unavailable, the quantity can be divided by the overall value.

A suitable taxonomy will then need to be found to classify the data. It can be an off-the-shelf product such as ProClass, UNSPSC, PROC-HE or a taxonomy can be customized to be specific to your organization or industry.

This initial stage may take months if you are working with large volumes of data. It might be worth considering outsourcing this initial task to experienced professionals able to complete the project in a shorter time with greater accuracy.

It'll Save Money in the Long Run

Data accuracy is an investment, not a cost. Address the issues at the beginning. While it may seem like a costly exercise, you will undoubtedly spend less than if you have a to resolve an issue further down the line with a time-consuming and pricey data clean-up operation. By involving the whole team or organization, it will be much easier to manage and maintain the most accurate data possible.

Spend data classification shows you the whole picture, as long as it’s accurate. You can get a true view of your spend, allowing improved cost savings, better contract compliance and, possibly the most important benefit of all, preventing costly mistakes before they happen.

So, does your data have its COAT on? What does “dirty data” mean to you? Get in touch at susan@theclassificationguru.com.

Region: 

About The Author

Susan Walsh's picture

With nearly a decade of experience fixing your dirty data, Susan Walsh is The Classification Guru.
She brings clarity and accuracy to data and procurement; helps teams work more effectively and efficiently; and cuts through the jargon to address the issues of dirty data and its consequences in an entertaining and engaging way.

Susan is a specialist in data classification, supplier normalisation, taxonomy customisation, and data cleansing and can help your business find cost savings through spend and time management - supporting better, more informed business decisions.
Susan has developed a methodology to accurately and efficiently classify, cleanse and check data for errors which will help prevent costly mistakes and could save days, if not weeks of laborious cleansing and classifying.

Susan is passionate about helping you find the value in cleaning your ‘dirty data’ and raises awareness of the consequences of ignoring issues through her blogs, vlogs, webinars and speaking engagements.