https://medium.com/@rohanhazra4/sparkify-user-churn-analysis-eeb1ed88775f
Predicting user churn using PySpark
Sparkify is a fictitious music streaming app similar to Spotify or Apple Music. Predicting user churn or churn analysis is used to find potential customers using a service, who will either downgrade or cancel the service. For Sparkify downgrade means moving from a paid subscription to an ad supported model. Churn analysis is extremely crucial for an business, as it can identify customers at risk and prevent a loss of revenue for the company.
Sparkify.ipynb : jupyter notebook containing the code and in depth explanations.
- Pandas
- Numpy
- PySpark
- datetime
- matplotlib
Machine Learning at Scale using PySpark can predict the churn of users. Although I used a smaller subset in this project, running the model on the full dataset of 12GB will produce more accurate results.
Random Forest Classifier is used to predict the churn values. The F1 score achived is 0.91.
The dataset is provided by Udacity.