If only we could know things in advance, especially those things that bother us the most today. Machine learning and predictive analytics are often touted as the ultimate solution – and they can be. However, certain misconceptions limit effectiveness of predictive models or even hinder their development to begin with. In this article, I will address these misconceptions by offering three guidelines for predictive analytics.
Predictive analytics: the context
Predictive analytics have long played a role in areas such as market research, where we predict a context, e.g. a market trend. This type of models typically forecast something that is beyond our control: a development we need to deal with and adapt to when making decisions, such as the expected average salary for IT professionals over the next years.
Predictive models take on a new role in people analytics, because we do not limit ourselves to forecasting an environment beyond our control. Quite the contrary: we want to predict things so that we may intervene!
Machine learning is a tool that helps to do so. It can serve a wide range of purposes, like offering employees that next best training, figure out and leverage what makes a team successful, and much, much more.
But interfering with what you are predicting complicates things. It requires some considerations that are typically glossed over when predicting a context in the classical sense.
In my experience, chances of success with predictive analytics are greatly increased when you keep the following three guidelines in mind:
In this article, I will discuss each of these points and how they contribute to your success in an analytics project aimed at developing predictive models.
Being so popular, predictive analytics has become a go-to solution because the term is top-of-mind. It is also a broad concept with the inherent risk of miscommunication: does everyone involved mean the same thing when they say “predictive model”?
In analytics, the term predictive is not always used in a temporal sense. Yes, sometimes it’s about estimating next quarter’s reimbursement costs, but it could as easily be about determining a match between an applicant and a function. Predictive models are about likely outcomes, often regardless of time.
The trick is to get on the same page with stakeholders and figure out what needs to be “predicted” in order to solve the case at hand. Start with the business issue and work from there to see how a (predictive) model may support it. This will allow you to tackle potential mismatches in expectations among stakeholders, as the word ‘predictive’ can have different connotations. One stakeholder may want to know a predicted number, while another wants a predicted match or even a model that “predicts where and how they can take action”.
Do not get hung up on the term ‘predictive’ and work with your stakeholders to decide what insight is needed, and how a model may contribute.
In the media we often read about highly accurate models, e.g. for predicting attrition. Claims of 95 percent accuracy from a machine learning model sound impressive, but what do they mean?
These numbers actually mean less than most people think – but that will be the topic for another article. For now, let’s assume it means that the model is reliable. If so, we want the performance scores of any given model to go through the roof, right?
Not necessarily. Generally speaking, there is a tradeoff between model performance and transparency: the more powerful and accurate the model, the harder it is for us humans to grasp how it transforms data to a particular output or prediction.
I would argue that in most cases the actual performance of a model is secondary to transparency about the patterns in the data that it uses. Without a clue about those patterns, the model becomes like a magic 8-ball: it tells you what may happen, but not why. It is exactly the ‘why’ that we need to know about to inform our decisions.
What helps us is knowing what patterns in the data are used by the model. The performance of the model then indicates how informed our decision-making process may be when we understand and consider the same patterns. If the model performs well, both model and decision may be considered ‘well-informed’.
But beware of performance measures! Normally, these emphasize how often a model gets its answers right or wrong. That is only part of the story. Another thing to consider is “lift”, which is how model performance compares to chance level. After all, a 60% accuracy score seems underwhelming, but it could be a factor 10 improvement over chance level.
In order to understand the model, wouldn’t you want to know what the model uses to improve its performance over mere guessing by that much? Even if a model is not reliable enough to justify implementation as a digital solution, it can still be valuable for decision making.
Trading in some performance for transparency is rarely an issue in people analytics. Take flight risk, for example: models use various parameters to deduce what sort of employees might leave.
Ultimately, the goal is to get an idea of underlying motivations for leaving so that we can intervene where needed. However, without mind-reading data the performance you can reasonably expect from a model will often be limited when predicting complex human behavior.
Increased transparency rarely harms the usefulness of a model. More likely, it’s increased despite sacrificing some performance. This is because you build trust by making the model more accessible and understandable. After all, people are more likely to use what they understand and less likely to run a business on “magic 8-ball results” from an opaque model.
Keep in mind that the end result need not be a digital solution to be useful. It could be an organization-specific list of top risk factors in work-related stress, for example.
Even when everyone agreed on what should be predicted and your model is transparent, you may find that there is never a clear way to use insights gained from that model.
To counter this, take into account what can be influenced or controlled. Absenteeism models may rely heavily on age or gender to provide an outcome, neither of which an organization can do much about. Take care not to develop a model using mostly or only such features, which is a risk when using HR data exclusively.
Instead, consider adding data on working night shifts, short or long shifts, alone or in teams, and so on. These are factors that can be changed and are therefore more likely to yield actionable results.
Gather information on relevant business processes, identify related data sources and use those, if you can. This way, your model is more likely to provide insight in how HR themes and business processes are related and what could be done to improve both.
When talking about machine learning and predictive analytics, I commonly hear that there needs to be a “digital solution” at the end. Models very rarely make it into production in one go, and that makes sense: predictive model development is largely about research and that means that you cannot know in advance what you will find. Does that mean that these projects tend to fail? Not if they provide useful insights, in my opinion.
The guidelines described in this article are aimed at working towards a useful outcome from the outset by ensuring:
- agreement on goal and common expectations (consensus about what “predictive” means)
- favoring insight over having a digital tool (the performance/transparency tradeoff)
- working towards relevance and actionable results (by factoring in what can be targeted by actions)
Chances are that your model will not end up embedded in some dashboard immediately. However, following the guidelines in this article, I daresay that it will still be valuable. Providing useful results tends to lead to more specific questions. And these can lead to development of more specialized models ready for production.
Please let me know if the guidelines in this article helped you and be sure to let me know if you subsequently put a model into production as a result!