Skip to content

Latest commit

 

History

History
36 lines (32 loc) · 1.76 KB

README.md

File metadata and controls

36 lines (32 loc) · 1.76 KB

Predicting Income based on Demographics - Team Insight

This project is a part of the course Data Analytics (UE18CS312) at Department of Computer Science, PES University Electronic city campus. With this, we aim to predict the income class of a person based on his demographics such as age, education, race, gender, marital-status and so on. We achieved a maximum testing score of 0.864524 using the catboost algorithm.

Dataset being used

UCI Machine Learning US Census
This dataset is extracted from the US Income census of 1994.

Tools and Tech used

We made use of R and Python to complete this project. We used R for the exploratory data analysis and python for building models. This method is recommended by many across the globe.

How to clone and run the project.

To clone the code to your local system run -

git clone https://github.com/vishnureddys/income-prediction

After cloning, you can open the file using Jupyter Notebook with Anaconda or Miniconda. Make sure that the following packages are installed, if not please do install it. If you have the ipynb extension for VS Code, you could use that as well.

catboost
scikit-learn
matplotlib
numpy
pandas

To perform data cleaning use -

pip install pandas, numpy
python cleaning.py

For running the Exploratory Data Analysis (EDA) you will need to have R installed on Jupyter Notebook, which can be done from the shell or Anaconda Console.
In Jupyter, click on run code, to run the code. This is an interactive console and does not need any commands to run it. For more information on how to use Jupyter, please refer to this article.

Work by

  1. Pranav L Nambiar
  2. Vishnu S Reddy
  3. P Varshith