TalkingData_AdTracking_FraudDetection

A kaggle competition to predict whether a user will download an app after clicking a mobile app ad You can download the data from the competion page I built an XGBoost model that is fitted in a numerical encoded data, The model was trained using 55000000 records of the train dataset.

It took time between 25:45 minutes (it changes according to Data Size) on a 16GB and 8 cores machine. The algorithm implementation is very robust even with very large datatest.

After some excessive hyperparameters tuning, I got AUC of 0.9638 ln the public Leaderboard

The Model Predictions (My 2 Top score Submissions):

The submission file in this Google Drive link

LightGBM model has too many hyperparameters and those needs carefull tuning, the best way to do that is using Grid Search Optimization

According to LightGBM documentation hasTo increase the accuracy of a LightGBM model, to increase the prediction accuracy of the algorithm you can start by doing one of the following:

Use large max_bin (but it may be slower)
Use small learning_rate with large num_iterations
Use large num_leaves (but it may cause over-fitting)
Use bigger training data
Try dart, You can choose the boosting_type of the algorithm between gbdt, dart, rf or goss

To Do::

TalkingData company is providing a huge amount of data 200 million record , if you have an enought powerful machine, you diffently would want to train using the whole dataset.
You can try tp implement a deep learning model for such a huge data
Try to do more feature engineering and see if results can get better
Try to do downsampling due to the inbalanced ratio of the fraud and non-fraud records

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
DNNClassifier_model_usingTF.ipynb		DNNClassifier_model_usingTF.ipynb
LICENSE		LICENSE
LinearClassifier_model_using_TF.py		LinearClassifier_model_using_TF.py
README.md		README.md
data_EDA.ipynb		data_EDA.ipynb
download.png		download.png
lightgbm_script.py		lightgbm_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalkingData_AdTracking_FraudDetection

About

Releases

Packages

Languages

License

alaakh42/TalkingData_AdTracking_FraudDetection

Folders and files

Latest commit

History

Repository files navigation

TalkingData_AdTracking_FraudDetection

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages