-
Notifications
You must be signed in to change notification settings - Fork 0
hzhz2020/DataMining-Hotel-Recommendation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CSE 5243 - Final Project Zhuoer Wang, Zhe Huang Data Our data can be retrieved here: https://www.kaggle.com/c/expedia-hotel-recommendations/data FinalReport.pdf: Final group report and individual report Directory Folder code includes all the code we wrote RFpredictor.py: The final model we used for the prediction. downSize.py: downsize the data booking.py: retrieve only booking records in the training set ca.py: formatting the prediction output generated by R for kaggle submission EDAforWhole.R: perform EDA on the whole training set EDAforSample.R: perform EDA on the sample set Preparedata.R: handling missing value and removing features that were not used DT.R: 5-fold cross-validation of DT model KNN.R: 5-fold cross-validation of KNN model NB.R: 5-fold cross-validation of Naïve Bayes model NN.R: 5-fold cross-validation of Neural Network model RandomForest.R: 5-fold cross-validation of RandomForest model SVM.R: 5-fold cross-validation of SVM model Others.R: Code used for PCA and basic parameter adjusting using R packages Outputgeneration.R: generate prediction output for test set Folder predictions includes all the predictions we made at the different stage of developing our random forest model. You can submit it directly to kaggle @ https://www.kaggle.com/c/expedia-hotel-recommendations/submissions/attach for MAP@5 evaluation Run All of our Python library can be installed by using pip install (lib_name) Run predictor.py: Python3 predictor.py - Set training set path: line 10 - Set testing set path: line 40 - Cross validation on train: uncomment line 36 & 37 - Predictions will be output to "result.csv" under default directory Run other python code: Python2 (name).py DT.R, KNN.R, NB.R, NN.R, RandomForest.R, SVM.R are the completed R code for building and performing 5-fold cross validation in R. Load the data, run Preparedata.R to perform the basic data manipulation that shared by all the 6 models. And then can run each model, need to install the required packages before running each models. Others.R include some codes that was used for parameter testing and principle component analysis. And may be recycled in the future. Noted that these code can be used for different model by making minor adjustment, so it is only included once. Other parameter testing has been done by manually trying different input value, and is not included in the file. Outputgeneration.R can be used to write the testing result into csv file.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published