Efficiency Metrics

The Impact of using Accuracy as a Metric for assessing ROI of a Data Extraction System

August 9, 2023
The Impact of using Accuracy as a Metric for assessing ROI of a Data Extraction System

With so many of our technology needs shifting towards automating routine tasks, one of the main evaluation metrics that is used frequently in business is accuracy. In many contexts, that is the metric which has the maximum direct impact on the ROI of a business process.

At the same time there are pitfalls when the context of the use case changes and this can confound the correlation between accuracy and ROI, in turn resulting in a business making decisions that will adversely impact its future revenues.

In order to understand the context in which Accuracy is relevant and the counter context when it becomes completely irrelevant, we need to examine one of the fastest growing areas of ML and AI usage i.e. Intelligent Data Processing.

But before that we need to have a clear understanding of what we mean by accuracy. For starters let us understand some definitions in simple terms:

True Positive: Something that the system predicted as being correct and was correct in reality

True Negative: Something that the system predicted as being incorrect and was incorrect in reality

False Positive: Something that the system predicted as being correct but was incorrect in reality

False Negative: Something that the system predicted as being incorrect but was correct in reality

Now that we have these terms defined, we can go ahead and set the definition for accuracy.

At first glance, this seems like a straight forward metric. If we get the system to predict true positives and true negatives properly, the system will work in satisfactory manner. So, higher accuracy is good. This is true in most scenarios. However, it fails when you take into account the hidden ambiguity in the definition of True Positive or True Negative when dealing with higher level interconnected data.

For example, let us consider a simple set of fields ‘expenses incurred during a specific financial period’ being extracted from an annual report. Here is an example sentence from which we can extract this information:

The marketing and advertising expenses incurred during the financial year 2017 and 2018 were $2.5 million and $3.5 million respectively.

Now let us assume we use a prediction system to extract this data and we got these results:

  1. Record 1
  2. Expense: $2.5 million
  3. Date: 2018
  4. Expense Item: marketing
  5. Record 2
  6. Expense: $3.5 million
  7. Date: 2017
  8. Expense Item: advertising
  9. Record 3
  10. Expense: $2.5 million
  11. Date: 2018
  12. Expense Item: advertising
  13. Record 4
  14. Expense: $3.5 million
  15. Date: 2018
  16. Expense Item: advertising

(Please note that this is a highly simplified example created only to illustrate the challenge and does not reflect the ability of current ML systems.)

Here, we notice that all 6 captured values are correctly predicted by themselves i.e. we have 2 expense items, 2 periods and 2 currency values. So purely from a field prediction point of view, the accuracy is great. But that is because at a value level, we are comparing apples to apples and each value exists individually in the label set and the prediction set.

Now, if you consider the data at a record level, we run into a problem – which record is completely correct or completely incorrect? Neither.

Each record has roughly 30-100% correct data (depending subjectively on which subset of fields you consider as being correct). So is it a true positive or a true negative?

  • If you try to combine the accuracies of the three fields at a record level, it cancels out.
  • If you consider a more real world scenario, you will have more fields per record and a random set of fields are either correct or incorrect depending again on subjectivity of choice. That further compounds the problem.
  • Furthermore, some fields are more important that others and hence there is an intrinsic business value associated with getting them right or wrong. This also becomes part of the subjectivity of the assessment here.

Back to Accuracy

Let’s take a position here – that a record is correct only if every field is correctly predicted. It would remove this ambiguity of true positive and true negative. We would then be able to use the accuracy metric to assess the system. However, let us look a little more deeply at what could happen.

If a system has predicted 100 records each with 10 fields, it would be highly accurate if most of these records had all the fields predicted correctly. But let us assume that one field is particularly difficult to predict and the system got that one field wrong. Now each record has one wrong field and hence the accuracy metric for the overall set of records will be zero. So the business would take a decision to ignore the system and continue with its existing manual/assisted data extraction process.

In reality however, the system has only missed 100 fields out of 1000, and this means it has done 90% of the work a human would have had to do. If your cost of data extraction is the human time taken to perform the task, the reduction is 90%. This would rightly reflect more significantly on your cost reduction and therefore on your ROI for the system. With this change in perspective, the system would be ideal for the business to adopt.

Before we conclude that this is a solved case, let us clearly address how the ambiguity of data extraction efficiency is handled.

For this, we need a properly defined metric to assess such a system.

To provide a foundation for this metric, we will now define a very simple way to address overall system performance while still keeping the link to ROI embedded in the metric.

For this we will define some new terms:

Human Touch Points(HTP):

Each interaction a human had with the data where he/she had to make an edit to the value of one field. The edit could be either adding a value, updating it or even deleting a wrongly predicted value. 1 point is provided for each field where the human corrected the issue and brought the record to a state of complete correctness.

System Prediction Points(SPP):

Each value prediction or value transformation a system provided in its workflow finally resulting in an output that is directly consumable by another system. Here, we assign 1 point to each field where the human did not have to makey a change at all.

Note, that there is still an aspect of subjectivity here. The human selects which combination of fields he/she considers as the basis to start correcting the predicted values. In other words, the human could select the records with the least number of correct fields to work on and therefore gain more points against the system.

However, we make the assumption here that people will tend to do less work and not more work over longer periods of time. So, they will learn to pick the best records to correct the minimum number of fields and hence reduce their time consumed.

With that in mind, we can now define a new metric for system efficiency:

This gives us a very effective means of deciding how any extraction system will affect our ROI. The higher the system efficiency, the lower the human touch points and therefore lesser time taken to process the documents and extract the data.

Other Considerations

Keep in mind that although this metric rightly reflects on the ROI, there are a few other factors to consider. For most unstructured data, the challenge is that there is no fixed way to assume that the system is performing at a certain level. Changes in regulatory language, business context and even change in the people assigned to create these reports will result in changes that cannot be foreseen.

So, add a certain level of verification effort to the system. This should be measurable directly per document in terms of time taken. It should also ideally be an order of magnitude less than the effort for extraction work itself.

With this one correction, we now have a way to assess the Data Extraction challenges of today. Further enhancement can be done to this metric by taking other factors like scriptability or adaptability coupled with usability of user interfaces.

Interested in Simplifying Your Data Extraction?