This is a model to predict the housing prices in the city of St.Petersburg, Russia.
The data to build the model is taken from the Yandex Realty Database. It consists of various features such as Area, floor, rooms etc. and relates to the period between January 1, 2017 to August 1, 2018. The data was cleaned, outliers removed and split in two categories, namely- training and test data. Training data (approx 70% of the whole dataset) is chosen from January 1, 2017 to April 1, 2018 Testing data (approx 30% of the whole dataset) is chosen from April 1, 2018 to August 1, 2018.
The dataset was cleaned in order to correct for missing data and avoid outliers
Below we can see some descriptive statistics
Here is how the correlation matrix look like
Here is the relationship between the price of the apartment and the availability of the studio
In this project, 2 models have been created. The first model is created using the Catboost regressor which includes the parameters- open plan, rooms, area, renovation. The catboost regressor algorithm is applied using the gradient boosting on decision trees to predict the prices.
And the second model is created using the Random Forest regressor with parameters- floor, open plan, rooms, and area.
The primary reason is to cross-validate the results. Catboost is used for its ability to drastically reduce the error through gradient boosting and Random forest is used because it is appropriate when working with large data sets. Both the models have very low- MAE, MSE and RMSE.
It is possible to run the models on a virtual machine by installing the required python3 libraries- NumPy, Catboost, Flask and Skitlearn. Then, the following steps should be followed on VM.
-
Create a GIT folder on VM
-
Download folder from GitHub using command > python app.py
-
To run the app, use 5444 as the port
-
Select which model to run- 1 for the Random Forest and 2 for the catboost.
-
If an incorrect model is chosen, the app will return an error.
-
Then run the app on postman usinf the link below
XX.XXX.XX.XXX:5444/predict_price?model=1&floor=1&open_plan=2&rooms=1&area=70&renovation=1
Docker is an application that makes it possible to run app and save its prototype on different devices. A Dockerfile should be downloaded. This file inlcudes the commands required to build a container and make a copy of the application.
To use docker, install and tune the application following the instructions here: install docker on ubuntu tune the docker on ubuntu
-
To build a container, use the following command
docker build -t smartcoder9/gsom_predictor:v.1.0
-
To launch a container, use the following command
docker run --network host -d smartcoder9/gsom_predictor:v.1.0
-
To see active container, use the following command
docker ps
-
To push to container, use the following command
docker push myDockerLogin/myDockerfolder:v.X
Just download Docker application and use the following commnad, using the repository name and use Docker run
git pull smartcoder9/gsom_predictor:v.1.0
docker run --network host -d smartcoder9/gsom_predictor:v.1.0
Make a query by setting the values of the parameters as needed and start predicting!
MAINTAINER Madiya Bano
RUN apt-get update -y
COPY . /opt/gsom_predictor
WORKDIR /opt/gsom_predictor
RUN apt install -y python3-pip
RUN pip3 install -r requirements.txt
CMD python3 app.py
Just pull the app from docker gsom_predictor using the following commands
#pull the app
docker pull smartcoder9/gsom_predictor:v.1.0
#run the app
docker run --network host -d smartcoder9/gsom_predictor:v.1.0