Skip to content

Commit

Permalink
Launch Project Done
Browse files Browse the repository at this point in the history
  • Loading branch information
drguthals committed Jul 28, 2020
1 parent bbfc782 commit 2a25f73
Show file tree
Hide file tree
Showing 47 changed files with 411 additions and 282 deletions.
Binary file added LaunchProject/images/anaconda-prompt.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added LaunchProject/images/change-python.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added LaunchProject/images/choose-conda.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added LaunchProject/images/decision_tree.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added LaunchProject/images/ensure-python.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified LaunchProject/weather/RocketLaunchDataCompleted.xlsx
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Determine the Precise Rocket Launch Question to Ask

"And we will use the AI and Machine Learning algorithms to process these data to learn and discover patterns, then we will use these pattern to decide if a day is good for a rocket launch or not."
Data science is an iterative process between the knowledge and understanding of what is today, the data that has been collected, and the questions that are being asked. New questions yield more information and the intention to gather more data.

In a future learning path, we will add the NASA satelite imagery data to better decide if the earth atmosphere conditions are good for launch
The questions that NASA scientists have to ask when a new mission is being planned is "What day in X years will be the least likely day to cause a launch push due to weather?" Furthermore, the days leading up to and of the rocket launch, NASA Scientists are the most critical in asking "Will the weather in this area at this time cause any potential issues for the launch?" To answer these questions, NASA has rocket, weather, and flight experts who create guidelines and models to follow to make a determination. Additionally, they have data from their own sensors and weather balloons, as well as trusted sources such as [NOAA - National Oceanic and Atmospheric Administration](https://www.noaa.gov/).

In this module, we don't have all of the data or expertise that NASA has on the day of a launch, but we do have simple weather data that is publicly available to leverage. This module will look at:
- Conditions (Cloudy, partly cloudy, fair, rain, thunder, heavy storm)
- Temperature
- Tumidity
- Wind speed
- Wind direction
- Precipitation
- Visibility
- Sea Level Pressure

Throughout the rocket launches learning path, you will use artificial intelligence and machine learning to discover weather patterns on days where rocket launches did successfully happen. Using those patterns, you will predict whether a launch is likely to be able to happen given specific weather conditions.

## Additional Challenge

While this module will walk you through a specific way of solving this problem, you're encouraged to pause for a moment here to make predictions and think about other data or questions you might be able to ask related to the safety of rocket launches.

For example, do you think Temperature is a more important indicator of launch safety than Precipitation?

Could you use [Azure Cognitive Services](https://azure.microsoft.com/services/cognitive-services) to take real-time satellite images and use image classification to determine the types of clouds and therefore the likelihood of a safe launch?

What ideas do you have?
Original file line number Diff line number Diff line change
@@ -1,9 +1,31 @@
# Explore the Rocket Launch Data to Gain an Understanding

To train a Machine Learning model, we have to collect as much data as possible
- AI and Machine Learning systems need data to learn, without data they cannot learn anything!
Therefore we first have to collect as much as data about launches as possible. For this learning path we have collected and used Microsoft excel to store them, here is a screenshot of the data collected.
As described in the Introduction to Rocket Launches module (link to unit 7), machine learning models are trained using enough data to avoid mistakes. Without enough data, a machine learning model could be too general. For example, if you trained a machine learning model with only temperature data and nothing else, you might not discover that precipitation is significantly more important and doesn't always correlate to lower temperatures in Florida, USA. If that were to happen, the model might indicate that it was safe to launch a rocket on a day that falls in a good temperature, but where there is a lot of precipitation and therefore would be unsafe.

## Collecting Data

The first step in any data science/machine learning solution is to collect and understand the data. For this learning path we have collected publicly available data from [NOAA](https://www.noaa.gov/) and [Weather Underground](https://www.wunderground.com/history) for the dates of NASA rocket launches taken from the [list of NASA missions Wikipedia page](https://en.wikipedia.org/wiki/List_of_NASA_missions) and compiled it into one Excel file.

The Excel file contains the weather data for the individual crewed and uncrewed launches, as well as the two days before and after each launch. We added the two days surrounding the launches data to see if there were any patterns that might be interesting. Here is a screenshot from the [Excel doc that you can download here](https://nasadata.blob.core.windows.net/rocketlaunches/RocketLaunchDataCompleted.xlsx).

![excel data](../Media/excel.png)

As you can see, in this data we have a variety of different launches we are looking at with a variety of different attributes. Looking at the Pioneer 3 rocket on row 3 of the excel document it is apparent that we have have a lot of data about it. The rocket launches on December 6th, 1958 at 1:45 from the Cape Canaveral Space Center. Furthermore, there is weather data that has been gathered from the area around the launch site for that day.
## Missing Data

What we find in this Excel doc is extensive data about each launch. However, are you start to exlore this data you might find one big issue:
There is only one row that represents a rocket launch that was supposed to happen, but was pushed due to weather concerns:
Row 294 - Space X Dragon - May 27, 2020

A list of every single launch that was attempted but pushed due to weather is not as easily discoverable as the list of successful launches. Furthermore, the dates that were considered and moved prior to announcing the expected launch date are also not easily discoverable.

## SMEs: Subject Matter Experts

The [United States Airforce's 45th Space Wing](https://www.patrick.af.mil/About-Us/Weather/) has one mission: "Exploit the weather to assure *safe* access to air and space." Combined with the incredible minds at NASA, the likelihood of choosing a date that will have weather concerns is small. The subject matter experts on weather and flight take into account climate changes, weather patterns, and existing known data to ensure the fewest amount of changes to a launch schedule.

You can start to explore this on your own by heading to the [NASA Launch Schedule](https://www.nasa.gov/launchschedule/). Even without machine learning, you can start to look at predicted weather patterns in Cape Canaveral and see if you can identify why that date/time was chosen over one a week before or after.

## Finding More Data

The goal of this rocket launch learning path is to start you on the curious journey of weather and it's relationship with launches. We encourage you to discover more data to improve your own machine learning model. This is part of the data science journey!

What do you think you could use to discover launches that had to be pushed due to weather? News articles? Archives?
Original file line number Diff line number Diff line change
@@ -1,6 +1,52 @@
# Exercise - Import Python Libraries and Rocket Launch Data

Now that we know a little bit about what we want to accomplish, let's start creating the machine learning model. The first step is to import some libraries that will help us create the model and import the weather data.
With a goal, *Is a launch likely to be able to happen given specific weather conditions?*, and a data set containing weather data from successful and one pushed launch day, as well as the days leading up to and following each launch - you can start to actually code!

## Machine Learning in Code

While there are a number of tools and services used to solve machine learning problems, these space-themed learning paths will be using Visual Studio Code, Python, SciKit Learn, and Azure. Microsoft has a [video about downloading and configuring a similar environment](https://www.youtube.com/watch?v=5E3WMb8_T3s&list=PLlrxD0HtieHjDop2DtiCmwTTcrlwKAVHE&index=8) to the one we need. You can also see additional instructions in LP2M7.

When setting up your local programming environment, we recommend creating an Anaconda environment to ensure you have exactly what you need for that particular project. If you have another way or set of tools you prefer to use, the majority of these modules do not explicitly require Visual Studio Code or Azure.

## Local Environment Setup

Before continuing, be sure that you have:
- Visual Studio Code installed with the Python and Jupyter Notebook Extension (Link to LP1M1)
- An Anaconda environment with Pandas, NumPy, scilkit-learn, pydotplus, Azure Machine Learnign SDK
- A folder to store all of the code and data
- The data downloaded and saved to the folder
- A blank Jupyter notebook saved in the folder
- The folder open in Visual Studio Code
- The Visual Studio Code Python environment set to the Anaconda environment

### Local Setup

To setup your local environment:
1. Install [Anaconda](https://www.anaconda.com/products/individual)
2. Open the Anaconda prompt
![Anaconda Prompt](Learn\launch-project\2-data-collection-and-manipulation\media\anaconda-prompt.jpg)
3. In the Anaconda Prompt, create a new Anaconda environment:
```
create -n myenv python=3.7 pandas numpy jupyter seaborn scikit-learn pydotplus
```
4. In the Anaconda Prompt, activate the new environment:
```
conda activate myenv
```
5. In the Anaconda Prompt, install AzureML-SDK:
```
pip install --upgrade azureml-sdk
```
6. In the Anaconda Prompt, install an Excel reader:
```
pip install xlrd
```
7. With the folder open in Visual Studio Code, make sure your Python Intepreter and Jupyter Kernel are both set to your Anaconda Environment. Click on both the top right Jupyter Kernel Python version and the bottom left Python Interpreter and make sure you set them both to use the Anaconda Environment you created:
![Visual Studio Code with Anaconda Environment](Learn\launch-project\2-data-collection-and-manipulation\media\ensure-python.jpg)

## Import Libraries

With your Visual Studio Code local environment setup, you can now import the libraries that will help us import and clean the weather data, and create and test the machine learning model..

Copy the following code into a cell and run it to import all of the needed libraries.

Expand All @@ -25,15 +71,19 @@ import pydotplus
from IPython.display import Image
```

## Read Data into a Variable

Now that we have all of the libraries imported, we can use the pandas library to import our data. Use the command `pd.read_excel` to read the data and save it in a variable. Then, we will use the `.head()` function to print out the first 5 rows of the data. This will ensure that we have read everything correctly.

```Python
lanch_data = pd.read_excel('../Media/weather/RocketLaunchDataCompleted.xlsx')
lanch_data.head()
launch_data = pd.read_excel('RocketLaunchDataCompleted.xlsx')
launch_data.head()
```

## Begin Exploring Data

Finally, we can use the `.columns` function call to view all of the columns in our data. This will show us the different attributes the data has. You will see some common attributes like names of past rockets that have been scheduled to launch, the data they were scheduled, if they actually launched, and many more. Look at these columns and try to guess which ones will have the greatest impact of determining if a rocket will launch or not.

```Python
lanch_data.columns
launch_data.columns
```
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,91 @@ Now that we have the data imported, will we need to apply a machine learning pra

We do this because computers will get confused if they look at inconsistent data or if lots of values in the data are null.

- AI and Machine Learning systems need data to learn, without data they cannot learn anything!
Therefore we first have to collect as much as data about launches as possible. For this learning path we have collected and used Microsoft excel to store them, here is a screenshot of the data collected.
## Data Cleansing

The first step that we will take to clean our data is to replaces all the missing values with something. Replacing these values usually requires your best judgement because you might now know what the data should be. In our case, we have some blank values where we are missing some weather data. To not mess with our real data that much, we will replace this missing data with weather for a normal day (ie fair weather).
The first step that we will take to clean our data is to replaces all the missing values with something. Replacing these values usually requires subject matter expertise, but in this case you will use your best judgement. In our case, we have some rows (remember, rows represent days) where we are missing some weather or launch data.

To get started, first get an overview of the launch data by typing this into a cell:
```Python
launch_data.info()
```

This gives an overview of the data, showing us that of 300 rows, there are some columns with missing information:
```Output
RangeIndex: 300 entries, 0 to 299
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 60 non-null object
1 Date 300 non-null datetime64[ns]
2 Time (East Coast) 59 non-null object
3 Location 300 non-null object
4 Crewed or Uncrewed 60 non-null object
5 Launched? 60 non-null object
6 High Temp 299 non-null float64
7 Low Temp 299 non-null float64
8 Ave Temp 299 non-null float64
9 Temp at Launch Time 59 non-null float64
10 Hist High Temp 299 non-null float64
11 Hist Low Temp 299 non-null float64
12 Hist Ave Temp 299 non-null float64
13 Percipitation at Launch Time 299 non-null float64
14 Hist Ave Percipitation 299 non-null float64
15 Wind Direction 299 non-null object
16 Max Wind Speed 299 non-null float64
17 Visibility 299 non-null float64
18 Wind Speed at Launch Time 59 non-null float64
19 Hist Ave Max Wind Speed 0 non-null float64
20 Hist Ave Visibility 0 non-null float64
21 Sea Level Pressure 299 non-null object
22 Hist Ave Sea Level Pressure 0 non-null float64
23 Day Length 298 non-null object
24 Condition 298 non-null object
25 Notes 3 non-null object
```
Most notably, we can see that `Hist Ave Max Wind Speed`, `Hist Ave Visibility`, and `Hist Ave Sea Level Pressure` have no data.

It makes sense that `Wind Speed at Launch Time`, `Temp at Launch Time`, `Launched`, `Crewed or Uncrewed`, `Time`, and `Name` only have 60 values, since we only have 60 launches in our data, the rest are the days preceeding and proceeding the launch.

Here are a few ways to cleanse the data:
- We know that the rows that do not have a Y in the Launched column did not have a rocket launch, so we will make those missing values 'N'
- For rows missing information on whether the rocket was crewed or uncrewed we will assume uncrewed. There were fewer crewed missions so it is likely it was uncrewed.
- For missing wind direction we will just mark them as "unknown"
- For missing Condition data we will just assume it was a typical day and put "fair"
- For any other data, just put the value as 0

In the next cell, paste and run this code:

```Python
## To handle missing values, we will fill the missing values with appropriate values
lanch_data['Launched?'].fillna('N',inplace=True)
lanch_data['Crewed or Uncrewed'].fillna('Uncrewed',inplace=True)
lanch_data['Wind Direction'].fillna('unknown',inplace=True)
lanch_data['Condition'].fillna('Fair',inplace=True)
lanch_data.fillna(0,inplace=True)
lanch_data.head()
launch_data['Launched?'].fillna('N',inplace=True)
launch_data['Crewed or Uncrewed'].fillna('Uncrewed',inplace=True)
launch_data['Wind Direction'].fillna('unknown',inplace=True)
launch_data['Condition'].fillna('Fair',inplace=True)
launch_data.fillna(0,inplace=True)
launch_data.head()
```

Next, since computers only know how to read numbers, we will convert the text into numbers. As an example, we will use a "1" if a rocket is crewed and a "0" if a rocket is un-crewed.
Try running `launch_data.info()` again to see the changes to the data you just made.

*NOTE*: You are changing the data that is stored in the launch_data variable, *not* the data saved in the Excel doc. So if you find that you can modified or removed any data that you didn't mean to, you can always re-run your notebook to bring the original data back in.

## Data Manipulation

Next, since computations are best suited for numerical inputs, we will convert all text into numbers. As an example, we will use a "1" if a rocket is crewed and a "0" if a rocket is uncrewed.

```Python
## As part of the data cleaning process we have to convert text data to numerical because computers only understand numbers
label_encoder = preprocessing.LabelEncoder()

# There are 3 columns that have categorical text info and we convert them to numbers
lanch_data['Crewed or Uncrewed'] = label_encoder.fit_transform(lanch_data['Crewed or Uncrewed'])
lanch_data['Wind Direction'] = label_encoder.fit_transform(lanch_data['Wind Direction'])
lanch_data['Condition'] = label_encoder.fit_transform(lanch_data['Condition'])
launch_data['Crewed or Uncrewed'] = label_encoder.fit_transform(launch_data['Crewed or Uncrewed'])
launch_data['Wind Direction'] = label_encoder.fit_transform(launch_data['Wind Direction'])
launch_data['Condition'] = label_encoder.fit_transform(launch_data['Condition'])
```

Now let's look at all the data again after it has been cleaned. Looking all nice and fresh!

```Python
lanch_data.head()
launch_data.head()
```
Loading

0 comments on commit 2a25f73

Please sign in to comment.