Skip to content

Project for EDSB project in EDSA 2019 post-graduation (NOVA - IMS)

License

Notifications You must be signed in to change notification settings

HenrikPereira/EDSBproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Resources Analysis - Predicting Attrition

EDS Bootcamp project | Enterprise Data Science and Analytics (2019)

Use Case: Human Resources Analysis Predicting Attrition

Authors: Bruno Candeias1, David Oliveira2, Henrique Pereira3 & Manuel Oom4

  1. M20180313: [email protected]
  2. M20181430: [email protected]
  3. M20181395: [email protected]
  4. M20181431: [email protected]

I. Important Files

The most important files in this work are the following:

  • Final Presentation_EDSB_20191209_v0.01.ppt
  • HumanResourcesAnalysis_PredictAttrition.pbix
  • Data_models.ipynb
  • data_pre_proc.py
  • auxiliary.py

Please check the requirements file for further information.

II. Status Report

We choose this use case mainly because it allows us to explore different models, which will give us the opportunity to have a broader view for the problem: descriptive and predictive. In addition, human resources turnover is very present in our professional life, which also motivated our choice.

Our approach will follow the work flow:

  1. Data Exploration;
  2. Model Evaluation & Selection;
  3. Results Presentation.

III. Dataset Exploration

The dataset (HR_DS.csv) that will be used in the use case (Human Resources Analysis Predict Attrition) contains 1470 records with 35 columns:

  • Age; Attrition; BusinessTravel; DailyRate; Department; DistanceFromHome; Education; EducationField; EmployeeCount; EmployeeNumber; EnvironmentSatisfaction; Gender; HourlyRate; JobInvolvement; JobLevel; JobRole; JobSatisfaction; MaritalStatus; MonthlyIncome; MonthlyRate; NumCompaniesWorked; Over18; OverTime; PercentSalaryHike; PerformanceRating; RelationshipSatisfaction; StandardHours; StockOptionLevel; TotalWorkingYears; TrainingTimesLastYear; WorkLifeBalance; YearsAtCompany; YearsInCurrentRole; YearsSinceLastPromotion; YearsWithCurrManager

We'll explore the dataset to evaluate each variable and how they are correlated. Our first findings were:

  • Most of our data (DistanceFromHome, MonthlyIncome, NumCompaniesWorked, PercentSalaryHike, TotalWorkingYear, YearsAtCompany, YearsSinceLastPromotion) shows skewness, and not normal;
  • Columns like YearsWithCurrManager and YearsInCurrentRole have 2 different distributions with a cutoff by 5 years;
  • There are several variables that have outliers: MonthlyIncome, NumCompaniesWorked, PerformanceRating, StockOptions, TotalWorkingYears, TrainingTimesLastYear, YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion, YearsWithCurrManager;
  • Regarding the variable Attrition and how it is correlated with other variables, we analyzed the data and we realized that younger employees leave more in all categories, except SalesExecutive, ManufacturingDirector, Manager and Divorced. We also realized that the gender is important in some conditions.

Regarding the block Model Evaluation & Selection, we will study several predictive and classification models to apply in the dataset related with the use case, namely:

  • XGBoost;
  • Logistic Regression Classifier;
  • Linear Support Vector Classification;
  • C-Support Vector Classification;
  • Random Forest Classifier;
  • Keras Deep Neural Network.

About

Project for EDSB project in EDSA 2019 post-graduation (NOVA - IMS)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages