Modeling for prediction
In order to find a model which could help with the prediction process we ran several data mining models
- Decision Tree and Random forest
- Logistic Regression
- Support Vector Machines
- Artificial Neural Networks
- Extreme Gradient Boosting
From the previous results its clear that decision tree stole the show!
However lets think practically
- It is often required to explain the business why we think a person could leave, in this case we need a model whose output we can explain. In our case a decision tree or logistic regression
- Sometimes HR would just like to run our model on random data sets , so its not always possible to Balance our datasets using techniques like smote
- Our model should just be able to predict better than random but imagine the cost of entertaining an employee who was not going to leave but our system tagged him – This is a future improvement for our model
- XGBoost model created a nice ensemble of trees for us, whose accuracy could increase more than the decision tree if we get more data
We successfully created an early warning system which immediately tells the Human Resources department if an employee is prune to leave or not.
We achieved this early warning system based on several data mining techniques in order to be very accurate on supervised classification modelling