Six Sigma applied to HR Analytics: An Introduction - Analytics in HR

Six Sigma applied to HR Analytics: An Introduction

Today, data science touches every aspect of business functions. It enables greater speed, accuracy, and quality of business decision making. As an inherent part of core...

976 1
976 1

Today, data science touches every aspect of business functions. It enables greater speed, accuracy, and quality of business decision making. As an inherent part of core business operations, “people functions” hasn’t been immune to this development.

The best data science practices are known to combine management tools with statistical- and machine learning insights. These combinations have helped immensely in strategic decision making, directly improving productivity and profitability of a large number of businesses worldwide.

Six Sigma is perhaps the most established and documented approach in this regard. Fundamentally, Six Sigma is a data driven methodology to improve the quality of a process (i.e. any repetitive business function) by reducing the variation around the mean of the process.

In other words, ensuring that the process falls within the acceptable tolerance range (as far as possible). This is referred to as “process entitlement level” in Six Sigma.

In theory, a perfect score in Six Sigma is 6. This would mean that 99.7% of all data points fall within the tolerance range. However, in practice, a good sigma score depends upon the dynamics of the particular process in question.

Six Sigma projects typically involve a well-documented DMAIC framework (Define, Measure, Analyze, Improve & Control):


A. Identifying the to-be-improved business process and its dependent sub-processes.


B. Gathering data relevant to the process/sub-processes.


C. Forming hypotheses through a host of brainstorming tools.

D. Testing the sigma level through a host of statistical calculations on accumulated data.


E. Finally, improving the sigma level through the implementation of the suggestions and reports gained by the testing of the hypotheses.


F. Long term process-monitoring, in order to ensure the continuity of the improved process.

Part A (Define)

Business Problem

In this particular case study, we applied Six Sigma to solve an important HR business problem:“Improving efficiency of the HR recruitment function”.

Our client for this project is a prominent international tech headhunting firm. Our client estimates that “Their ROI (return of investment) in context to investment in job portals and social networking platforms isn’t up to the best industry benchmarks and can be improved upon”.

This is negatively impacting our client via:

  1. Overall service standards against its competitors
  2. Inefficient use of scarce funds
  3. Poor quality, untimely and insufficient number of candidate profiles
  4. Branding and market positioning

Defining the CTQ (Critical to Quality) via QFD (Quality Function Deployment)

To get the project started, we needed to define the CTQ aspect for the client business problem: ”Improving efficiency of the HR recruitment function”.

The preferred tool for high level pre-analysis in Six Sigma projects is the QFD (quality function deployment). This is usually employed as the first component in the “Measure” phase. This tool pits off sub-processes of the CTQ by correlating them to their functional components (i.e. engineering parameters).

Quality function deployment

QFD computes an “explicit quantitative and correlative method” on the functional components required for the sub-processes (of the core process) and then deploys weighting functions to prioritize parameters of the functional components.

This aids in the selection and customization of functional components in order to improve the quality of the process. Important functional components can be handled as individual Six Sigma projects.

In the context of our project the sub-processes discovered were: Quality of profiles, turnaround time, overall processing time, effective authentication of candidate profiles and the ability to maintain discreetness about hires and vacancies from competitors and the market.

These were then plotted along with quality characteristics, such as methodology deployed for hiring, effective management, indexing of resume database, and professional engagement methodologies with major job boards.

On the basis of the QFD we identified our CTQ for this Six Sigma project as the “Optimum valuation of professional engagement methodologies with major job portals and online  networking platforms for maximization of ROI”.

No reliable data recording system existed as such, pertaining to the “Optimum Valuation of professional engagement methodologies with major job portals and online networking platforms for maximization of ROI”.

However, through a detailed and innovative analysis, including the extraction of historical email records exchanged over the last 3 years, it was analyzed that our clients’ ROI for “the professional services of job portals and online networking platforms”  was around 230%.

Based on business intelligence estimates, the market leaders (Korn Ferry, Manpower etc.), enjoy an ROI of 400% or above for this particular business process.

The Goal of this Project

Based on the business problem, QFD analysis and CTQ identification, the goal for this project was marked as  “ROI to be increased from 230% to 300% or above (compounded monthly)”.

Three principal components (Factors) have been identified based on business process expertise, which directly impact “efficiency of the recruitment function and engagement with job portals”. The hypothesis will be tested for:

  1. The optimal monetary investment in professional association with job portals & professional networking portals
  2. The most effective and efficient distribution of resume collection time of the recruiters among the 4 job portals/databases
  3. Relative strong/weak areas among the job portals & professional networking sites
  4. If there should be a distinct methodology and approach in dealing/negotiating with the different job portals & professional networking sites

Summing up this section: In this section we understood the core business problem of our customer and we delved deeper into the business  problem via the application of QFD. This has helped us identify key sub processes involved, the CTQ of the project and eventually the project goal.

Our project goal is “ROI to be increased from 230% to 300% or above (compounded monthly).

Part B (Measure)

Process Capability

In line with the project goal, the ROI was defined as “percentage of revenue per week over the expenditure/ investment in job portals and social networking sites“.

The revenue per week wasn’t always correlated with the  expenditure/investment in job portals and social networking sites for that particular week, as benefits were often realized much later. However, in order to maintain computation uniformity and practicality, it was assumed to be so.

Data sampling for Process capability and other statistical analysis.

There were no formal records and reliable data available for our project. Therefore, data was meticulously extracted from informal records and from email records/personnel of four recruiters through selective IMAP protocol.

After a preliminary preprocessing and evaluation of the extracted data, it was decided to:

  1. Use a combination of stratified and random sampling. Data was stratified on a “3 month cycle strata, 10 strata were created with each strata containing 3 months of data , divided into units of one week ”. Based on a power curve for one-sample t-test (see graph below, a random sample was chosen with equal proportion from the strata s (6 units from each strata, each strata comprising of 12 units) for the optimum sample size of 60 units.
  2. Use the “power curve for one-sample t-test” in order to calculate the “optimum sample size under the given circumstances” (power curve for one)

power curve for one-sample t-test

Summing up this section:  In this section we looked at the process map, collected and preprocessed data for the analysis, and determined the optimal sample size. Calculating an optimal sample size is important for any statistical analysis to be reliable.

 Based on the sample data collected via the above step, a Process Capability Analysis for continuous data (sample) was created:

Process Capability Analysis for continuous data (based on sample data)

process capability of ROI (percentage)

Sigma Level (adjusted) calculated from CPK = (3CPK + 1.5) = 2.1

A Run Chart for continuous data (sample) was drawn (based on date order), in order to get a better visualization of the trends over time (clustering, oscillation).

run chart Xbar-R

Observations and Points to note here

Our focus for this Six Sigma improvement project is for the process to hover between the target and the USL-level for improving the process capability score:

  1. Process capability has been historically low in the fast moving and intensely competitive tech recruitment industry. Based on business intelligence estimates the market leaders (Korn Ferry, Manpower, etc.) operate at about 3.2 sigma level (for this particular CTQ)
  2. The target level of 320% has been kept at ambitious levels on purpose, in order to identify improvement areas
  3. The run chart indicated a cyclic trend, with an interesting spike in performance between the 25th and the 35th week. Data will be intensively mined in this period so as to discover the root cause of the performance spike in this period. The other main parameters of the run chart look to be in control.
  4. The control chart indicated that the process is, overall, already under control, though the variation can still be improved upon.

 Summing up this section: We calculated the sigma score of our process (as 2.1). We were also able to identify trends in the process through SPC (statistical process control) charts and noted the spikes. This information is a starting point for further analysis.

Part C (Analyze)

Based on the observations of the “Process Capability Analysis”, the core competencies of the Job Data Sources/ Social Networking Services were statistically analyzed using individual value plot and analysis of variance (ANOVA).

Records were extracted for the total number of resumes procured per week (excluding duplicates) from the four Job Data Sources/Social Networking Services [A,B,C,D] for all 60 weeks (units) by the four recruiters.

Please note, this data only gave information about the total number of all matching resumes extracted, irrespective of skill set and conversion ratio.

Analysis of this data gives us a broad picture of the overall resource strength of all the four Data Sources under evaluation. This analysis can help us prioritize/rank our resources at the preliminary level.

individual value plot

One-way ANOVA

Data was collected for the 60 weeks matching the number and type of resources received from each specific resource.

Numeric identities [1, 2, 3, 4] were given to the Job Data Sources/Technical Networking Data Sources & Numeric identities [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] were given to the specific technical requirements

The Individual value plot and the One-Way (Unstacked) ANOVA analysis indicate that Data Source 1 was the marginal leader as far the total number of resources is concerned. However, the variation here was unacceptably high.

Data Source 3 is a very close second with much less variation. Therefore, Data Source 3 should be the preferred source as per preliminary visual inspection. Data Source 2’s performance was slightly below the other two.

Data Source 4 performance was unsatisfactory on all parameters (as a medium of sourcing quality resumes). A policy decision will be taken during the “improve phase” based on these analyses.

A stronger relevance to our requirements would be the information on which source should be the preferred source for each specific tech requirement.

Summing up this section: We evaluated all four data sources in terms of the total number of matching profiles (for all skillsets) obtained over 60 weeks. This information will be a critical component as it will help in prioritizing engagement and investment with our Data Sources

Multi-Vari Chart Resumes

Multi–Var Charts enable us to know the competencies of Job Data Sources/Technical Networking Sites with regards to specific technical requirement. An interesting observation is that, while Data Source 4’s overall performance is the lowest, it still has core competencies in “software testing and QA”. Likewise, Data Source 2 has core competencies in “storage tool”.

The graphical analysis also indicates inadequate core in-competencies in some area. For instance, operating system/mainframe requirement has low resource output from all the channels, indicating that our client may have to depend on direct headhunting or other channels to source the talent.

Summing up this section: We evaluated all four data sources in terms of the total number of matching profiles (for individual skillsets) obtained over 60 weeks. This information was a critical component, as it helped  tailor our approach for a specific technical requirement to be fulfilled by our client.

There are two Principal Models for resume collection from the Job Data Sources/Social Networking Sites.

  1. Direct database access of the Data Sources.
  2. Advertising in the respective Data Sources and receiving resumes in response.

When a particular client requisitions a particular requirement, both the time, as well as the budget to come up with the required resources, is limited. A statistically comparative analysis between these two approaches would help in prioritizing our approach here.

Records were extracted with respect to the total number of resumes received by each principal method, for each of the 60 sample weeks.

A box plot was created to get a strong overview of the data strength and distribution for both the principal methods:

Boxplot advertisements direct database

The direct database access method was clearly the more productive method, with all parameters [mean, median, quartiles] being higher compared to the advertisement methods. Based on this analysis, a policy decision was subsequently taken during the “improve phase”.

Summing up this section: We scrutinized all four data sources and mechanisms of resource collection with statistical tools, such as ANOVA, Individual Value Plot, Multivariate Chart and Box Plot. This analysis helped us gain insights in:

  1. The overall strength of the four Data Sources and their specialist skill strengths
  2. Evaluating the efficiency of direct database access versus advertising

We took a policy decision regarding effective usage of these reports, which will be taken during the “improve phase” in this analysis.

Part D – Controlled Design of Experiment (Advanced Analyze)

During the QFD and the brainstorm sessions, it was suspected that one of the factors significantly affecting our goal of “improving the percentage of revenue per week over the expenditure/investment in job portals and social networking sites“, was the “effective and efficient utilization of recruiters engagement time in sourcing profiles from these four Data Sources”.

It so happens that, in a few working days, the recruiters do not correspond with headhunters. For those particular working days, the task of the recruiter is to open-surf the Data Sources and collect profiles according to anticipated requirements in the future.

Afterward, they tag and index them in the database.

As per standard practice, the recruiter is given the freedom to choose the Data Sources he/she wishes to surf. The recruiter surfs these sites randomly, or equally divides the time between these four Data Sources. It all depends on the individual recruiters’ choice.

In order to calculate the optimal utilization of time to be invested in the surfing of these 4 Data Sources, a Controlled Design of Experiment was conducted:

What is a DOE ( Design Of Experiment )

In the design of experiments, the values of x are experimentally controlled, unlike other statistical studies in which they are observed. DOE are also called as observational studies.

The purpose of a DOE is to understand the y=f(x) equation to the maximum extent possible, and tuning it to the best performance possible.

The key areas for understanding in Design Of Experiments are:

  1. The x’s that have the maximum effect on Y (X in our case is recruiter time invested on the respective Data Sources and Y is the Revenue).
  2. The exact (or closest) mathematical relationship between significant x’s and Y.
  3. Statically confirming that an improvement has been made, or that a difference exists with respect to different values of X.
  4. Discovering where to set the values/levels of the significant x’s to have the maximum positive effect on their respective y’s.

Methodology of DOE for this project: Since interaction effects were not considered as a factor, and in order to minimize time and costs in the experiment by reducing the number of runs, the Placket-Burman DOE methodology was used.

Defining the architecture of the design: The four Data Sources 1, 2, 3 and 4 were the factors, the levels were 1 and -1 . There were eighteen experimental runs in all .

The task in relation to the architecture of the design: The experiment was conducted over a period for 18 consecutive days by one recruiter.

The task assigned to the recruiter was to surf 2 hours (120 minutes) in all among the 4 data sources and collect, tag and index up-to 24 resumes (2 resumes each of 12 different skills sets, randomly numbered)

+1, +1, +1, +1 in a particular run would translate to the recruiter investing 30 minutes each among the four Data Sources.

+1, +1, -1, +1 in a particular run would translate to the recruiter investing 40 minutes each among the three Data Sources having a +1 sign, leaving the Data Sources with the -1 sign out.

+1, +1, -1, -1 in a particular run would translate to the recruiter investing 60 minutes each among the two Data Sources having a +1 sign, leaving the two Data Sources with the -1 sign out.

+1, -1, -1, -1 in a particular run would translate to the recruiter investing 120 minutes on the Data Source having a +1 sign, leaving the three Data Sources with the -1 sign out.

and so on for a total of 18 runs.

The experiment was conducted and the total number of resumes collected, tagged and indexed was duly accounted. The results of the experiment were analyzed and observed as followed:

analysis and observations

Analysis and Observations of the experiment

  1. Data Source 1 emerged as the strongest positive factor, whereas Data Source 4 emerged as the strongest negative factor in the experiment. This was evident from the regression analysis, ANOVA, main effects plot and the standard effects plot.
  2. Data Source 2 and 3 do not emerge as statistically significant factors (as evident from the P value of ANOVA and the regression analysis). However, they do still have some business significance for certain mandates.
  3. The factor effects of Data Source 2 and Data Source 3 were found to be close to each other. However, a casual visual analysis revealed that Data Source 3 performed better in terms of quality of resumes versus Data Source 2. Data Source 2 performed better with regards to the size of the database, however a mathematically evaluation is beyond the scope of this Six Sigma project. It will be looked into in subsequent projects.

Summing up this section: A controlled design of experiment gave us deeper insights than available through casual observation studies alone. Through the DOE, we were able to confirm a difference in productivity with respect to the time invested among the four Data Sources. This will help us optimize recruiter time investment among the four data sources in the improve phase.

Part E – Improve

Based on the overall conclusions of the Define, Measure, Analyze, and Improve phase, the following strategic steps were initiated and implemented:

  1. The professional services of Data Source 4 was not renewed.
  2. The budget allocation was reformulated. 60% of the budget was allocated to Data Source 1, whereas Data Source 2 and 3 were allocated 20% each.
  3. The Multi-Var chart would be a guiding tool on a sequential use of the respective job Data Sources, based on the specific requirement given by the client.
  4. 75% of funding for resource collection with respect to a specific project would be used for “direct database access” and 20% funds for “advertisements”. 5% would be for reserve.
  5. “Software testing”, “QA”, “storage tool”, “mail servers” and “EJB design patterns” would be marketed as core competency skills of the consultancy.
  6. A policy was created for the business days in which the recruiter surfs the Data Sources and collects resumes for the anticipated requirements of the future. 60% of the time would be invested for Data Source 1, and 40% of the remaining time equally divided between Data Source 2 and 3.
  7. Quarterly review, fresh data accumulation and the reviewing of the graphical and analytical tools used in this project. If there is any change in status quo, results are to be analyzed and suitably updated.

A 30-day window was given for the implementation of the recommendations. Similar to the first project, suitable improvements in performance were observed, even with casual observations within a month of starting the improvement program.

Due to the time constraint, data of eight weeks was collected post-implementation and the process capability was evaluated accordingly.

Process capability of ROI post imp

Sigma Level (adjusted) calculated from CPK (3CPK + 1.5) is 3. A substantial improvement from 2.1, and already equal to the best industry standards.

There has been a 32% increase in revenue in the past two weeks. Since multiple Six Sigma projects were conducted side by side, which makes it difficult to calculate the real monetary benefits of both the projects individually at this point in time.

The sigma level is expected to increase over the next 6 months, as the full benefits from the improvement initiatives are realized leading to:

Part F – Control

I-MR Chart of ROI percentages post imp

The control chart shows the process well under control (as it was for this CTQ), even before this particular Six Sigma initiative.

Summing up key long term implications of applying Six Sigma framework to HR:

This six sigma applied to HR project has several implications, as listed below:

  1. These types of statistical analysis, based on data collected by improved HR systems, will ensure that investments in HR are more data-driven, which helps HR to become more strategic in nature.
  2. The analysis can also assist companies build their own unique algorithms, optimize process flow, and even assist in their robotic process automation (RPA) efforts. This further improves the efficiency and effectiveness (as in this case, their recruiters).
  3. The service providers, namely Job Portals and Social Networking Sites, will increasingly have to showcase such evidence based data to secure business. Many Job Portals today sell the same database to multiple players, without offering such differentiating insights into the actual effectiveness of their database for closing positions.

We believe, as HR continues to digitalize its operations and collect more data about its processes, it is possible for HR to integrate strategic and evidence based approaches, like Six-Sigma.

We firmly believe that the age of analytics in general, and HR Analytics in particular, is already upon us, and we can all work together to improve our business processes and deliver a valuable ROI.


Amol Pawar, co-author, is a senior business consultant focused on organizational development. He is an expert in implementing HR technology by combining the human, process and technological aspect.

Join the Conversation