-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
41 lines (35 loc) · 2.43 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
CSE 5243 - Final Project
Zhuoer Wang, Zhe Huang
Data
Our data can be retrieved here: https://www.kaggle.com/c/expedia-hotel-recommendations/data
FinalReport.pdf: Final group report and individual report
Directory
Folder code includes all the code we wrote
RFpredictor.py: The final model we used for the prediction.
downSize.py: downsize the data
booking.py: retrieve only booking records in the training set
ca.py: formatting the prediction output generated by R for kaggle submission
EDAforWhole.R: perform EDA on the whole training set
EDAforSample.R: perform EDA on the sample set
Preparedata.R: handling missing value and removing features that were not used
DT.R: 5-fold cross-validation of DT model
KNN.R: 5-fold cross-validation of KNN model
NB.R: 5-fold cross-validation of Naïve Bayes model
NN.R: 5-fold cross-validation of Neural Network model
RandomForest.R: 5-fold cross-validation of RandomForest model
SVM.R: 5-fold cross-validation of SVM model
Others.R: Code used for PCA and basic parameter adjusting using R packages
Outputgeneration.R: generate prediction output for test set
Folder predictions includes all the predictions we made at the different stage of developing our random forest model. You can submit it directly to kaggle @ https://www.kaggle.com/c/expedia-hotel-recommendations/submissions/attach for MAP@5 evaluation
Run
All of our Python library can be installed by using pip install (lib_name)
Run predictor.py: Python3 predictor.py
- Set training set path: line 10
- Set testing set path: line 40
- Cross validation on train: uncomment line 36 & 37
- Predictions will be output to "result.csv" under default directory
Run other python code: Python2 (name).py
DT.R, KNN.R, NB.R, NN.R, RandomForest.R, SVM.R are the completed R code for building and performing 5-fold cross validation in R.
Load the data, run Preparedata.R to perform the basic data manipulation that shared by all the 6 models. And then can run each model, need to install the required packages before running each models.
Others.R include some codes that was used for parameter testing and principle component analysis. And may be recycled in the future. Noted that these code can be used for different model by making minor adjustment, so it is only included once. Other parameter testing has been done by manually trying different input value, and is not included in the file.
Outputgeneration.R can be used to write the testing result into csv file.