HR Analytics and the Art of Testing Hypotheses!

diksha tiwari
Towards Data Science
5 min readJun 12, 2020

--

https://images.unsplash.com/photo-1591696205602-2f950c417cb9?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=

As technology spreads its roots in the HR department and tech giants start exploring the endless possibilities that AI and ML can bring to HR , the simple art of testing a hypothesis could still go a long way in helping HR make meaningful and impactful deductions.

What is the idea behind hypothesis testing and p-value?

Suppose a company produces a covid-19 test kit and claims that the test has a 99% accuracy i.e. if the entire population of the test kits was used for testing individuals, 99% of the test results would be correct. However, if we took samples from the population of the test kit, not all samples will be 99% accurate. Some samples would be 100% accurate while some would be only 95% accurate. Suppose someone with infinite wisdom gave us, the probabilities (mentioned below) of obtaining samples with different accuracy level from a population of tests claiming to be 99% accurate. The below table represents probabilities (given to us by someone) of obtaining samples with x% or less accuracy given population is 99% accurate:

Now if we want to test the company’s claim about the test kit’s accuracy using a sample of 100 test kits from the population, we can do so by performing a hypothesis test. Let us begin by stating the “null” and “alternate” hypotheses for this test.

Ho (null): the test is 99% accurate

Ha (alternate): the test has less than 99% accuracy

Suppose, the sample we had picked was 95% accurate. Now, we know that if our null hypothesis were true i.e. if the test kits were indeed 99% accurate, then the probability of picking a sample of kits with less than or equal to 95% accuracy would be 0.3% (or “p-value” will be 0.003). This implies that under the null hypothesis, obtaining a sample with 95% accuracy is very unlikely. Hence, we reject the null hypothesis and accept the alternate hypothesis.

One might wonder, how we decided the cut-off p-value below which we can reject the null hypothesis. The answer to that is in a threshold probability known as “significance level”. If the p-value is below the significance level, we reject the null hypothesis. Typically, a 5% significance level is adopted but the user can choose another value depending on the level of certainty required.

How hypothesis tests could play an important role in analyzing HR data?

An organization wants to determine if employees who have been resigning early (within six months of joining) were interviewed for a shorter duration as compared to other employees. Let us assume organization had enough reason (through feedbacks and surveys) to suspect that shorter interview time could be one of the factors influencing early resignations. To solve the problem, HR collects interview data of employees hired (having similar background and experience and hired for similar roles to keep possibly compounding factors constant) from Jan-Dec 2019 and segregates them into two groups- resigned within six months and continued beyond six months. Below are the descriptive statistics for the two groups:

Descriptive statistics:

Based on the descriptive statistics, we understand there is some difference in the average interview time of the two groups. Suppose the analysis is conducted in May 2020, now based on the difference obtained from the above sample can we say that it would hold true for the employees hired in Jan 2020 or was true for employees hired in Jan 2018? The data collected by HR represents only the sample of the population hired by the organization over the course of its operation. How can we say that results obtained for the sample hold true for the population? This is where hypothesis tests could help to bridge the gap. Hypothesis tests statistically determine whether two or more populations or groups of data are significantly different or not by testing the samples of that population. To test the above data let us start by establishing the null and alternate hypothesis:

Ho: Interview time of employees who continued after 6 months and who resigned after 6 months is same

Ha: Interview time of employees who continued after 6 months is more than the interview time of employees who resigned within 6 months

A 2-sample t-test of unequal variance (Welch’s t-test) was used to test this hypothesis and the significance level (α) was set at 0.05.

Welch’s t-test:

Degree of freedom:

Test results:

The p-value of 4.108*10^-7 signifies that ‘if the null hypothesis were true’ and we take multiple random samples from the two populations, then the probability that difference between sample means will be (5.9–4.8 = 1.1 or more) is 0.000041% . This probability is less than 5% hence, we reject the null hypothesis i.e. the difference between means of population is not zero thus indicating that employees who resigned within 6 months had shorter interviews than employees who stayed.

Subsequent course of Action:

Hypothesis test is not the result but the beginning of the further analysis. Once, we have established that employees who left early had shorter interview time; we should further try to identify how shorter interview time could be contributing to early resignations. Analyzing data for questions such as “is shorter interview time, causing communication gap which is leading to mismatch in expectation when employee joins?” or “a quick decision based on short interview is causing mismatch of skills and roles in hired employees” could help us reach the depth of real issue. Further to addressing such questions, one should also try to identify the optimum interview time that would address the issues identified and help us improve employee retention.

Once the optimum interview time has been identified, interview experience should be redesigned based on the results obtained by analyzing data. The new interview process/format should then be tested on employee samples. If satisfactory results are obtained, the interview experience redesign could then be implemented at organization level.

--

--

MBA @ IIM Ranchi| Business Analytics @ Purdue| Marketing Data Scientist @ JPMC