Category: HR employee Attrition

Classification Models – Employee attrition

Modeling for prediction

In order to find a model which could help with the prediction process we ran several data mining models

17

 

From the previous results its clear that decision tree stole the show!

However lets think practically

  • It is often required to explain the business why we think a person could leave, in this case we need a model whose output we can explain. In our case a decision tree or logistic regression
  • Sometimes HR would just like to run our model on random data sets , so its not always possible to Balance our datasets using techniques like smote
  • Our model should just be able to predict better than random but imagine the cost of entertaining an employee who was not going to leave but our system tagged him – This is a future improvement for our model
  • XGBoost model created a nice ensemble of trees for us, whose accuracy could increase more than the decision tree if we get more data

 

We successfully created an early warning system  which immediately tells the Human Resources department if an employee is prune to leave or not.

We achieved this early warning system based on several data mining techniques in order to be  very accurate on supervised classification modelling

EDA and Data Cleaning

Well the data is here

So we first start with EDA

  • Data is imbalance by class we have 83% who have not left the company and 17% who have left the company
  • The age group of IBM employees in this data set is concentrated between 25-45 years
  • Attrition is more common in the younger age groups and it is more likely with females As Expected it is more common amongst single Employees
  • People who leave the company get lower opportunities to travel the company
  • People having very high education tend to have lower attrition
  • The correlation plot was as expected
  • Link to eda workbook in python is here
  • From the Tableau plots we can conclude that below mentioned category are having higher attrition rate:
    • Sales department among all the departments
    • Human Resources and Technical Degree in Education
    • Single’s in Marital status (Will not use this due to GDPR)
    • Male in comparison to females in Gender (Will not use this due to GDPR)
    • Employee with job satisfaction value 1
    • Job level 1 in job level
    • Life balance having value 1
    • Employee staying at distant place
    • Environment Satisfaction value 1

 

First of all we have categorical data and if we want to run machine learning algorithms in python we need to be able to convert categorical variables(nominal) to dummy variables and ordinal ones to integer values.

Once we are done with that we need to embrace the fact that our data is biased so in order to equalize the class balance we make use of the Synthetic minority oversampling technique (SMOTE). You can google about it.

The code file is located here for your reference ->   https://github.com/mmd52/3XDataMining/blob/master/DataCleaning_And_Smote.ipynb

IBM Employee HR Attrition

Its a new day, a client walks in and says he needs your help.

Our client is ABC a leading firm and is doing well in the sector. It is recently facing a steep increase in its employee attrition . Employee attrition has gone up from 14% to 25% in the last 1 year . We are asked to prepare a strategy to immediately tackle this issue such that the firm’s business is not hampered and also to propose an efficient employee satisfaction program for long run. Currently, no such program is in place . Further salary hikes are not an option.

data is here

Well this is a nice business problem, so lets do some more research on it – >

The attrition problem is not only unique to ABC but to other IT companies such as XYZ, India’s second largest IT services company, that is also battling high attrition, with a peak attrition of 20.4 % in the October-December quarter of FY15.

Now that we know the market situation what can we do ?

This slideshow requires JavaScript.

 

From this decision tree it should be clear that we will create an early warning system to help the company identify those employees which are more probable to leave the company.

In the following posts we will go through

  1. EDA
  2. Data cleansing
  3. Classification models

 

But why is a company so affected by employee attrition

  • Cost of training a new employee
  • cost of acquiring a new employee
  • But most importantly an employee is a asset that adds value to a company, and when an employee leaves a value percentage of the company is diminished with it, at the end a company spends an enormous sum trying to replace this employee and recreating the value it lost.