A predictive model which predicts if a customer will subscribe to a Fixed Deposit based on his financial and personal status. This model is implented using the python language. The data-set used to train this model, has the following columns/features.
This is a classification problem, that predicts if a customer will subscribe to a term deposit based on his personal data, given below.
Packages used: The coding portion of this project was done in python entirely, in the Jupyter notebook. In the initial stages of the code three python packages were mainly used which were;
-
pandas: package dealing with creation and manipulation of data sets using data frames.
-
NumPy: has many advanced mathematical functions.
-
matplotlib.pyplot: for plotting various graphs.
Later on, during the model building stage the following package was used.
-
train_test_split: A sub module from the python module sklearn.model_selection which partitions the data set as specified by the user.
-
LogisticRegression: A sub module from the module sklearn.linear_model used to perform Logistical Regression.
-
DecisionTreeClassifier: A sub module from the module sklearn.tree used to implement decision trees.
VARIABLE - DEFINITION
1.Id - Unique id of client
2.age - Age of client
3.Job - Job of the client
4.Marital - Marital status of the client
5.Education - Education of client
6.Default - Credit in default(yes/no). Tells us if the customer has any unpaid loans.
7.Housing - Housing loan(yes/no). Tells us if the customer has taken a housing loan in the past.
8.Loan - Personal loan(yes/no). Tells us if the customer has taken a personal loan in the past.
9.Subscribed(target variable) - Has the client subscribed to the term deposit.
From the dataset the first 8 columns(variables) are the predictor variables, and the final column is the target variable. Using a training data set the model is trained, and the testing data set is used to test the performance of the model when it's given unseen data.
The first 5 rows of the data set looks something like this.
Results of bivariate analysis. Here I compared 2 variables. One predictor variable and one target variable. I've analysed the relation ships between the following 5 sets of variables.
- Education of the customer and Subscribed(target variable)
- Default status of the customer and Subscribed(target variable)
- Housing loan history of customer and Subscribed.
- Personal loan history of customer and Subscribed.
- Martial status of customer and Subscribed.
For the predictive model I initially used the linear regression model , and I got an accuracy of 85.1%. Then later i used the decision tree model object, and my accuracy of the model went up to 88.56%.
The first 5 rows of the testing data set before the prediction was made by the model.
The first 5 rows of the testing data set after the prediction has been made.