A couple of years ago I was working as a HR controller for a large organization. We decided it was time to start working on strategic personnel planning, so I built a model. The model turned out to be terrible. In this article, I’ll explain why it failed and how we should have fixed it.
My model failed because I was building it on missing, unreliable, outdated data records. In the HR administration there were faults in overhead categories, productivity data was missing, the headcount was incomplete, et cetera.
Building assumption upon assumption led to a model that was as reliable as a slot machine.
Unfortunately, a lot of other organizations suffer from the same problem. Research by Experian Data Quality has shown 75% of the organizations believe that incorrect data is obstructing an excellent customer experience.
Yet 65% of the organizations will wait until problems in data quality arise before action is taken.
If you really want to start making an impact in HR, forget about complicated analysis and predictive modeling for now. Start with building a solid administration that is complete, accurate and topical. And yes, we can use algorithms for that, so it doesn’t have to be boring at all.
Why (HR) data is often flawed
Let’s start with why (HR) data is often flawed. In my experience, everything that is not continuously cleaned will result in a mess at some point.
It is no secret that organizations administrate masses and masses of records. In the end, the bulk of those records get typed in manually at some point in the process.
People that have to administrate a lot of records make mistakes, typos, forget things and take shortcuts (for example by filling in a dash or a zero in a required field because it will help them submit a form quicker).
Figure 1: Top reasons for flawed data according to Experian Data Quality Research (multiple answers possible)
In practice organizations generally notice this when they really, really need the information for a report, analysis, and so on. Usually, that’s when they take action to clean up their data.
In my example: when we noticed the overhead categories were flawed in the administration we did what most organizations would do.
We made an analysis of the records that were flawed, we instructed our administrative team to correct the flawed records, we ran a check to see if the records were corrected and we even instructed the administrative team to pay closer attention in the future.
And we left it at that.
A comparison that springs to mind is that of shoplifting. In practice, most organizations hire a ‘security guard’. Like the security guard that does his incidental round, they employ a controller or business analyst that performs an analysis on a specific field, creates a list of flawed or missing records, instructs the administrative team and goes on to perform the next analysis of a different field.
The risk is that the second the security guard takes a smoke break, the shoplifters return, and so does the flawed or incomplete data. The next step is to advance from ‘Ad-Hoc Improvement’ (the security guard) to ‘Continuous Improvement’ (24/7 security cameras).
Use case: Continuous Improvement at Leiden University
One of the leading organizations in Continuous Improvement is Leiden University in the Netherlands. Rob van den Wijngaard and his team from the Financial Shared Service Center have built a series of algorithms to continuously monitor and improve on flawed or missing records.
An example: in Finance double invoices can bring unnecessary costs to an organization. One of the algorithms they use continuously scans the administration for potential double invoices.
It checks the reference, vendor, and amount of the invoice. Each element can result in either no match, a hard match (the element is exactly equal to another invoice) or a fuzzy match (the element is very similar to another invoice).
Based on the number of matches a potential double invoice automatically gets flagged by the system (called an ‘exception’). Leiden University uses dozens of algorithms like this (checking on matches, empty fields, incorrect data, et cetera).
The fact that the scans are performed automatically means that a lot of different aspects of the administration can be examined on a continuous basis, without taking up precious resources.
Next to the algorithms Leiden University is currently also implementing a digital workflow to handle the exceptions. The exceptions will get sent to the administrative team automatically.
For each exception the system logs who has done what and why (laying the foundation for Continuous Auditing). That way not only the exceptions are being monitored, but also the follow-up.
This not only ensures exceptions are being handled but it also offers valuable feedback for the algorithms and process used. If the exception in the example above was in fact not a double invoice the administrative team flags the exception as a so-called ‘false positive’.
The input of the false positives ‘trains’ the algorithms to be more precise. If the same exceptions happen over and over again, it could mean the processes are not clear or need adjustment.
The trefoil model
It is important to recognize that the success of the example above is not only caused by IT. Leiden University is a firm believer of the so-called trefoil model. Continuous Improvement can only be a success when the pillars IT, People & Culture, Management & Organization, and Processes are closely working together.
For example, a motivated administrative team and management that focuses on fixing the flagged of exceptions are equally as important as their cutting-edge software that detects these exceptions.
Figure 2: the trefoil model
Continuous Improvement in HR
The use case above focusses on improving the Finance administration. In practice the quality of the administration tends to get more attention in Finance than in HR. I suspect this has two main causes.
The first cause is that the business case is less obvious in HR. Paying, for example, a double invoice leads to clear and direct unnecessary costs. To some this negative effect seems less apparent in HR.
Additionally, Finance generally pays a lot more attention to data quality because of Audit risks and obligations. However, to say that Continuous Improvement adds more value in Finance than in HR would be a mistake.
An example to illustrate this: Research by SD Worx shows that in Europe 44% of 4.000 respondents have experienced late payment of salary. In 48% of those cases payment was not only late, but the payslip also contained errors. In 79% of these cases the error was discovered by the employee himself.
This example raises a couple of points. The first point is that there is still a lot of room for improving data quality in HR. In my experience the payroll process is the HR process where data quality receives the most attention.
Errors in the payroll process obviously cost money and can raise compliance issues, so generally data quality in this field will receive more attention than for example registration of overhead categories. Just imagine how error-prone other HR data is that receives less attention. It also demonstrates that HR data can result in unnecessary direct costs.
There are other examples. One is the incorrect use of third-party hiring documents. Using the wrong third party hiring document leads to tax fines. Other examples are salary conformity with collective agreements, possible fraud in overtime premiums, et cetera.
Apart from direct costs, flawed HR data will lead to substantial indirect costs. Faults in overhead categories might seem insignificant, but if the result is that the organization is not able to reliably steer on overhead ratios they might prove to be costly mistakes.
We’ve seen that administrations are error-prone. Not just in Finance but also in HR (maybe even more so given a lower sense of urgency).
Flawed data is costly, either directly or indirectly. Luckily nowadays there are examples of organizations that successfully rely on ‘Continuous Improvement’ rather than on ‘Ad-Hoc Improvement’.
So next time you’re building a model based on HR analytics ask yourself the following question: “Is the data that I need for my model accurate and complete at this moment and will it be in the future?” If the answer is “No”, you might want to consider investing in Continuous Improvement first.