This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area and represents trips taken by members of the service for month of February of 2019. This project helped improved my data assesment & data cleaning skills as I did and Exploratory Data Analysis of the dataset.
The Ford GoBike organization would love to know if there is a relationship between age & number of subscribers/trips. So I would use the dataset to answer the following question;
- The number of trips taken by age group.
- The proportion of Subscribers to Customers.
- The number of subscribers compared to customers for each age group.
- The trend of trips taken over the course of the month.
This data set is taken from; Ford GoBike System Data.
After loading the dataset I took the following steps in my wrangling/cleaning process:
- I started by assessing the data visually & programmatically.
- I created a copy of the dataset before cleaning.
- I noticed issues with data types where multiple columns were formatted as the wrong data types which I re-formatted properly.
- I also spotted columns with missing values which I filled appropriately and the ones I couldnt fill I dropped them.
- I dropped the columns that were not neccesary for my analysis.
- I also did feature engineering to add some columns/features that would be helpful fot my analysis.
- After wrangling/cleaning of the data I moved on to the following data explorarion:
- Univariate Exploration.
- Bivariate Exploration.
- Multivariate Exploration.
The above summarises my data wrangling and analytics process for this project.
For this project I did my data cleaning and visualizations all in Python using complex functions such as;
After using Univariate, Bivariate & Multivariate exploration to check my variables of interest, this is the summary of my findings:
- I discovered that younger people between 18-50 years old take more trips and even subscribe more.
- Majority of the people that take trips are just Customers between 18 - 50yrs.
- Number of trips start declining after the 3rd week of the month.
During exploration some notable insights that I find interesting are;
- When are most trips taken in terms of time of day & day of the week?
Is there a relationship between trip_time and duration of trips?
Majority of trips were taken 5PM & 8AM but then I also noticed that the longest trips were taken 3AM & 2AM.
Thursday & Tuesday came out as the days that most trips were taken, I thought it would be on weekends though.
- Does a particular gender take more trips?
Male takes more trips but they appear to be for shorter distance/duration.
- Does gender affect the trip duration?
- Does age affect the duration of trips?
- What is the trend of trips over the weeks for: user_type, member_gender & age_group?
- I discovered that younger people between 18-50 years old take more trips and even subscribe more.
- Majority of the people that take trips are just Customers.
- Majority of trips were taken 5PM & 8AM but then I also noticed that the longest trips were taken 3AM & 2AM.
- Male takes more trips but they appear to be for shorter distance/duration.
- Thursday & Tuesday came out as the days that most trips were taken, I thought it would be on weekends though.
- Number of trips start declining after the 3rd week of the month.
- The marketing team should target their ads/marketing campaign towards younger audience.
- Ford GoBike can find a means to add incentive to being a Subscriber, to encourage customers to become Subscribers.
- Ford GoBike can also create an app that records subscribers total trip duration and rewards them with free trips to encourage more usage.