In a previous blog I wrote about how we started a project to see if we could analyze and predict absenteeism. In this blog I will update you on our project.
We were approached by a company with over 10,000 employees. The company’s HR manager was concerned about the increasing rates of absenteeism. He wanted to know why it kept increasing and what specific factors influence it. Insight into this increase would be invaluable in creating interventions that reduce absence. We were given a clear question with a strong business case behind it.
About a week ago, we had a brainstorm session with the entire Analytics in HR-team. The team consists of two consultants, two organizational psychologists with extensive experience in statistics and two data scientists. The core question for the brainstorm was: “Which hypotheses do we need to validate in order to help the organization?”
We defined a number of testable hypotheses which we will try to validate in order to find out how we can help lower the organization’s absenteeism. As this brainstorm was crucial for the direction of the whole project, I would like to share the hypotheses with you.
The hypotheses we came up with, depend on the data we have access to. It is of course impossible to test a hypothesis without data. We will have access to (anonymized) demographic data and engagement survey data. The latter consisted of multiple employee attitude measures.
- There are clear patterns in absenteeism data. We expected clear patterns in the data. In line with scientific research, we anticipated that younger people have lower absenteeism rates than older people. In addition, we expected that people who are sick more often are also likely to be sick more often in the future. These patterns in the data could on the one hand explain the company’s increasing absenteeism rates, and on the other hand provide us with valuable input for interventions.
- Engagement survey data has an impact on absenteeism. This is a more open hypothesis because we do not know the full extent of our data set yet. We believe that factors like engagement, job involvement and organizational commitment have an impact on absenteeism. These variables are expected to be helpful in explaining the increasing absenteeism rates and, as we mentioned in the first hypothesis, they provide us with input for interventions.In addition, we anticipated some factors to have a greater impact than others. We expect age to be stronger related to absenteeism than salary for example. By conducting a number of regression analyses, we hope to identify the most relevant variables that contribute to absenteeism.We also want to predict long-term absenteeism. With all the information we will have access to, we expect to be able to forecast how long-term absenteeism will increase when no interventions take place.The problem, however, is that long-term absenteeism is not prevalent in our data set. In fact, we were afraid that we would have insufficient ‘outcome’ data in order to do the analysis for predicting long-term absenteeism. In other words: we needed different outcome data.
- Short-term absenteeism predicts long-term absenteeism. To solve this, we wanted to see if we could use short term absenteeism as a predictor for long term absence. If short-term absenteeism is indeed a moderately accurate predictor, it would be easier for us to measure how different variables impact short-term absenteeism. This would mean that people who take a ‘sickie’ more often than the rest of the population (e.g. due to a conflict at work) would also be more likely to be absent on the long-term (e.g. the person experiences a burnout because of an ongoing work conflict). Because short-term absenteeism is much more prevalent in the data set than long-term absenteeism, it will be easier to make analyses and predictions based on this outcome.
- We are able to predict (long-term) absenteeism. We anticipate that different variables, such as age, sex, tenure and salary scales, impact absenteeism. The inclusion of these variables are quite self-explanatory. We will use these variables as controls in our case study and for later analyses.In addition, we expect work and personal factors to predict long-term absenteeism. As mentioned before, we do not have a full picture of the exact scope of our data. However, in recent literature, engagement survey data proved to be act as a predictor of absenteeism (Schaufeli, Bakker & van Rhenen, 2009). Also, factors such as the extent to which someone is involved in his/her job and the commitment one has to his/her organization, are factors that predict absenteeism as well. We expect these factors to be relevant in predicting at least some of the ‘preventable’ turnover.
- Attaching financial value to individual turnover will help us pinpoint where the absenteeism problem is most problematic – and where interventions will be most cost-efficient. One of the things we want to find out is if we can pinpoint absenteeism better by attaching monetary value to it. If a director is absent for two months it will be much more expensive compared to an absent secretary. Both are valuable to the company, but the director is more difficult and costlier to replace. By creating a quantified risk analysis, we expect to be able to estimate the financial absenteeism risk. The analysis will also show where in the organization this risk is greatest – and how interventions reduce this risk. Companies make risk analyses of almost everything, so they should do the same for their most valuable assets: their employees.
Answer the question
By validating these 5 hypotheses we expect to provide a useful answer to the question: “What causes my increasing long term absenteeism?”. The focus of our hypotheses will be on analyzing which interventions would be most appropriate and efficient to reduce absenteeism.
9 HR Analytics
In this case study collection, we have collected the best People Analytics examples we’ve come across over the past years.
Of course some of our hypotheses will change. Maybe there are even a few we will not answer at all because we don’t have the necessary data. This is an exploratory process. Depending on the data and our initial findings, we will probably redefine our hypotheses and see how we can help the client best.
In a next blog, we will provide an overview of the factors that cause absenteeism. In the meantime, check out our blog on how to link engagement to business performance using analytics!
Schaufeli, W. B., Bakker, A. B., & Van Rhenen, W. (2009). How changes in job demands and resources predict burnout, work engagement, and sickness absenteeism. Journal of Organizational Behavior, 30(7), 893-917.