Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Spacing
  • Loading branch information
mnirkko authored Aug 12, 2023
1 parent 7257803 commit 7be3bdb
Showing 1 changed file with 19 additions and 10 deletions.
29 changes: 19 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ This repository contains all the notebooks and scripts used to complete the fina
**Technical skills acquired**: Data collection (API/web scraping), data wrangling (numpy/pandas), exploratory data analysis (matplotlib/SQL), interactive visualisation (folium/plotly dash), predictive analysis (classification using LogReg, SVM, decision tree, KNN)

![A successful Falcon 9 rocket landing.](https://github.com/mnirkko/datascience/assets/6942556/1727f30d-174d-4337-9186-d3a9b8280f44)
**Figure 1** -- A successful Falcon 9 rocket landing.

**Figure 1** — A successful Falcon 9 rocket landing.


### Executive summary
* In this project, criteria for successful booster landings of SpaceX launches were evaluated. The relevant features that were investigated included payload mass, launch site and orbit.
Expand All @@ -25,25 +27,32 @@ This repository contains all the notebooks and scripts used to complete the fina
* We found that the percentage of successful launches increased significantly with time. Certain launch sites and orbits had a higher success rate, as did missions with higher payloads. Predictive modeling yielded 83% accuracy for predicting successful landings.

![Success rate of Falcon 9 launches over the past decade](https://github.com/mnirkko/datascience/assets/6942556/2c16bf9a-659c-4735-b314-2d493ee73868)
**Figure 2** -- Success rate of Falcon 9 launches over the past decade.

**Figure 2** — Success rate of Falcon 9 launches over the past decade.


### Methodology
* Data collection -- Loading data via API and web scraping, decoding the data and converting it to DataFrames
* Data wrangling -- Filling in missing values, converting labels to numerical values
* Exploratory data analysis -- Analysing the data using visualization and SQL
* Interactive visual analytics -- Enabling potential stakeholders to navigate the data using Folium and Plotly Dash
* Predictive analysis using classification models -- Building the dataset, standardising the data, splitting into training/test datasets, tuning and optimizing models using training data, evaluating prediction accuracy using test data, comparing results.
* Data collection Loading data via API and web scraping, decoding the data and converting it to DataFrames
* Data wrangling Filling in missing values, converting labels to numerical values
* Exploratory data analysis Analysing the data using visualization and SQL
* Interactive visual analytics Enabling potential stakeholders to navigate the data using Folium and Plotly Dash
* Predictive analysis using classification models Building the dataset, standardising the data, splitting into training/test datasets, tuning and optimizing models using training data, evaluating prediction accuracy using test data, comparing results.

![Flowchart detailing the methodology used](https://github.com/mnirkko/datascience/assets/6942556/c35c8c7c-77a0-4090-892d-cb54e8b4e192)
**Figure 3** -- Flowchart detailing the methodology used.

**Figure 3** — Flowchart detailing the methodology used.


### Results
* Exploratory data analysis showed that successful launches are correlated with the flight number (better performance in recent years), the payload (better performance with higher mass), and to some extent with the launch site and orbit used.
* Interactive analytics showed that the launch sites are close to the coastline, and further away from more densely populated areas (roads, cities etc.) clearly to reduce collateral damages in the event of a failed launch.
* Predictive analysis showed decent results for logistic regression (LogReg), support vector machines (SVM), and k nearest neighbors (KNN). For these methods, the classifier correctly predicted the outcome of a launch 83% of the time. The main issue remaining are false positives, as indicated by the confusion matrix.

![Classification accuracy using different ML techniques](https://github.com/mnirkko/datascience/assets/6942556/6695f459-11c0-4a5d-967e-2b34b720a291)
**Figure 4** -- Classification accuracy using different ML techniques.

**Figure 4** — Classification accuracy using different ML techniques.


![Confusion matrix for final prediction on test dataset](https://github.com/mnirkko/datascience/assets/6942556/939f384b-fd2e-4e6c-9e81-fdbd1107eac6)
**Figure 5** -- Confusion matrix for final prediction on test dataset.

**Figure 5** — Confusion matrix for final prediction on test dataset.

0 comments on commit 7be3bdb

Please sign in to comment.