How Machine Learning and Natural Language Processing Help Create Structured Versus Unstructured Data

Posted: 10/16/2018 - 03:43

Author: Tim Johnson

Most procurement organizations lack the necessary data to make decisions about their services spend. The fundamental reason for the gap between the data they have and the data they need is that it is buried in documents. As a result, there is a desperate need for deeper visibility into services spend in most organizations. For too long, spend analytics have only provided details at the aggregate level and in many cases the data is not accurate. The marketplace consistently wants to better understand this part of the business to gain greater insight.

The greatest challenge is capturing and organizing the detailed contract metadata contained in the Master Service Agreements (MSAs) and related Statements of Work (SOWs) across an organization. We refer to the detailed information stored in these documents as unstructured data. In most organizations, significant time and effort goes into developing these SOWs and there is a wealth of information in them that is rarely captured in the typical S2P or P2P process. Christie Schneider, with IBM Watson says, “It is estimated that 80% of the world’s data is unstructured, but businesses are only able to gain visibility into a portion of that data. Innovative companies are using data to enhance their value proposition and increase customer satisfaction.”

In most cases, companies’ documents are often in one of four primary states:

Organized digital files stored in a procurement management system
Digital documents stored on a shared drive
Paper records stored in filing cabinets
No official document or record

In almost all of the above cases, significant details never make it into a system or database designed for reporting or analysis purposes. Even when agreements are stored in a transaction system, the data is usually not robust enough to provide procurement leaders with the insights they need because these systems typically only require the data to manage budgets or process payments, and perhaps indicate high level categorization of spend/supplier. As a result, the ability to perform a comprehensive spend analysis or measure contract compliance is very limited. Another challenge with getting the information out of the documents and into a transactional system is that it requires a person with contract experience to read and interpret them – and do it efficiently and accurately to make it meaningful and worthwhile. When the data makes it into one of these transactional systems we refer to it as structured data.

To improve the growing data needs of the market, you must have the ability to reach into these documents to grab the important unstructured data elements. This is crucial in order to leverage it for reporting, analysis and improved decision-making. With the use of natural language processing and machine learning, you can extract from these documents the rich information contained in the unstructured data. Once the unstructured data is captured in a database, machine learning models can categorize and analyze the information and data within. These models help organize the data so it can be leveraged by analysts and procurement leaders. In simpler terms, this means you can turn unstructured data into meaningful information.

Imagine if you could automatically extract SOW end dates and compare them to payable data to determine if payments are being made outside of contracted dates. Or what if you had the ability to automatically extract the payment terms and details from SOWs without intervention from an individual in your organization? You’d be able to compare PO details with payable data to ensure you’re capitalizing on payment discounts, or recognize variances between what’s contracted in the SOW and terms within the vendor master or MSA to identify improved cash flow opportunity, or lost savings.

If a machine could extract roles and rates from your SOWs, and then compare them to standard rates within a staff augmentation rate card, or against market rates, you could potentially identify significant cost savings. What if you could reconcile structured data from a transactional system, such as a Vendor Management System (VMS), with the unstructured document data? The end result would be an enhanced data set that gives procurement leaders unprecedented visibility into contract compliance and risk. All of these things can be achieved through artificial intelligence (AI), be done quickly and accurately and leave your precious resources to focus on the strategy to resolve these discrepancies. These are things that can bring real value to your organization!

In a recent case study, we note three key customer benefits realized through AI:

Document repository – documents are securely stored and viewable
Structured data – previously unused data is extracted, categorized and organized by machine learning models that are accurate and efficient to create a layer of detail that can take you much deeper than aggregate supplier and category spend information
Reporting and analysis – newly structured data is accessible for visualization and export to drive supplier/sourcing decisions, pricing strategies and negotiation, as well as better business decisions

As a service provider supporting many customers, we have first-hand experience with the challenges organizations face trying to get a handle on their labor-based services spend. Today, significant resources and time must be utilized to produce what is often an ineffective and very high-level spend analysis. With AI, we have significantly reduced the level of effort required to extract the data needed from contract documents, while increasing the accuracy of and confidence in the information. As more documents are processed through machine learning principles, the processes get better at identifying the key data elements needed for meaningful spend analytics. As a result, procurement leaders can spend more time looking at the important data points and use them to make decisions and drive true value to their business partners through better spend management.

This piece was also contributed to by Jon Kesman

Tags:

Data