In this project I have gathered my listening history from my spotify account. Then performed data analysis on it to get a better understanding of my listening history, music taste, and choice of artist.
Moreover, performed a cluster analysis on the dataset.
- Editor used: VS code
- Python version: 3.11.3
- Packages used: pandas, matplotlib, seaborn, calplot, sklearn, spotipy, requests
- Data scraper 1: Blog and GitHub
- Data scraper 2: Blog and GitHub
The data is scraped from Spotify using it's API.
The data is then preprocessed and transformed as required to avoid any abberations that might later skew the results.
The data cleaning steps that are performed are:
- Dropping duplicate columns
- Dropping duplicate rows
- Null check
- Data Format check
- Value check
A few of the visualizaton highlights are:
Various cluster analysis are performed to group and define the cluster profiles of the songs.
Different cluster algorithms performed are:
- KMeans clustering
- Agglomerative clustering
- Affinity Propagation Clustering
- BIRCH
- DBSCAN
- Mini-Batch Kmeans
If you prefer an in-depth explanation for the code in this repository, you can go through the following articles:
- https://medium.com/analytics-vidhya/spotify-music-data-analysis-part-1-c8457bfc53a
- https://medium.com/analytics-vidhya/spotify-music-data-analysis-part-2-3a69ae0f7f01
- https://medium.com/analytics-vidhya/spotify-music-data-analysis-part-3-9097829df16e
- https://medium.com/analytics-vidhya/spotify-music-data-analysis-part-4-4016e2954795