Develop a Machine Learning model that can take sequential data and generate the Sea Surface Temperature (SST). The model will be trained on labeled data and used to predict SST on unlabeled data.
The dataset includes daily measurements of various atmospheric and oceanographic variables from multiple locations. The key variables include:
- Year, Month, Day
- Latitude, Longitude
- Zonal Winds, Meridional Winds
- Humidity, Air Temperature
- Sea Surface Temperature (Target Variable)
There are 2 Jupyter notebooks containing the project code:
train.ipynb
(for training)predict.ipynb
(for prediction)
- Load and clean the data: Fill missing values using forward fill.
- Feature Selection: Select relevant features excluding unnecessary lag features.
- Model Selection: LSTM (Long Short-Term Memory) network is chosen due to its effectiveness in handling sequential data.
- Data Preparation: Reshape the data to be suitable for LSTM input.
- Model Definition: Define and compile an LSTM model with one LSTM layer followed by a Dense layer.
- Training: Train the model using the training dataset with a validation split to monitor performance.
- Validation: Evaluate the model on the validation dataset using Mean Absolute Percentage Error (MAPE) as the metric.
- Load new data: Preprocess new data similarly to the training data.
- Predict: Use the trained model to predict SST for the new data.
- Save Predictions: Save the predictions to a CSV file.
The trained model was used to predict SST on new data from the given two CSV files. The model's performance was evaluated on the validation set using MAPE, providing a measure of prediction accuracy.
- MAPE on Validation Set: The model achieved a MAPE of 91.34%.
The predictions for the new data have been saved in the files as:
data_1997_1998.csv
→data_1997_1998_predicted_sst.csv
evaluation.csv
→evaluation_predicted_sst.csv
- Handling Missing Data: Used forward fill to handle missing values to maintain the temporal sequence.
- Feature Engineering: Initially considered lag features but decided against them for simplicity and consistency.
- Model Choice: Chose LSTM due to its capability to handle time series data effectively.
- Hyperparameters: Used a single LSTM layer with 50 units and a Dense output layer. Opted for the 'adam' optimizer and 'mse' loss function.
- Data Consistency: Ensuring the input features during prediction matched those used during training.
- Model Performance: Achieving a balance between training accuracy and generalization on validation data.
This project successfully developed an LSTM-based model to predict Sea Surface Temperature, crucial for understanding and forecasting ENSO events. The approach highlighted the importance of proper data preprocessing, model selection, and evaluation techniques. Future work can focus on enhancing model accuracy and robustness through advanced feature engineering and model tuning.
- Python 3.x
- Jupyter Notebook
- Libraries: numpy, pandas, tensorflow, sklearn, matplotlib
- Clone the repository:
git clone https://github.com/UtkarshRaj130/justOne.git
- Navigate to the project directory:
cd justOne
- Install the required libraries:
pip install numpy pandas tensorflow sklearn matplotlib
-
Training the Model:
- Open
train.ipynb
in Jupyter Notebook. - Run the cells to preprocess data, define, and train the LSTM model.
- Open
-
Predicting SST:
- Open
predict.ipynb
in Jupyter Notebook. - Run the cells to preprocess new data and generate predictions using the trained model.
- Save the predictions to a CSV file.
- Open
train.ipynb
: Notebook for training the LSTM model.predict.ipynb
: Notebook for predicting SST using the trained model.
I would like to show my gratitude to all those involved in creating this project for us. Looking forward to working on interesting projects in AI, ML, DL, and NN with both the clubs.
Thanking You,
Utkarsh Raj