Skip to content

curiousrohan/sparkify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Medium Post

https://medium.com/@rohanhazra4/sparkify-user-churn-analysis-eeb1ed88775f

Sparkify

Predicting user churn using PySpark

Table of Contents

  1. Project Motivation
  2. File Descriptions
  3. Libraries
  4. Summary
  5. Acknowledgements

Project Motivation

Sparkify is a fictitious music streaming app similar to Spotify or Apple Music. Predicting user churn or churn analysis is used to find potential customers using a service, who will either downgrade or cancel the service. For Sparkify downgrade means moving from a paid subscription to an ad supported model. Churn analysis is extremely crucial for an business, as it can identify customers at risk and prevent a loss of revenue for the company.

File Descriptions

Sparkify.ipynb : jupyter notebook containing the code and in depth explanations.

Libraries

  1. Pandas
  2. Numpy
  3. PySpark
  4. datetime
  5. matplotlib

Summary

Machine Learning at Scale using PySpark can predict the churn of users. Although I used a smaller subset in this project, running the model on the full dataset of 12GB will produce more accurate results.

Random Forest Classifier is used to predict the churn values. The F1 score achived is 0.91.

Acknowledgements

The dataset is provided by Udacity.

About

Udacity DSND Capstone Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published