Learning Objectives
- In this task, we will explore:
- Different steps in a generic Machine Learning pipeline
- Machine Learning classification and training models
- How to split the dataset into training and testing data
- How to prepare the Machine Learning model
- How to evaluate the model's effectiveness
This is the continuation of Day 14.
After the machine is up.
Run all the cell.
QUESTIONS
- What is the key first step in the Machine Learning pipeline?
Answer
Data Collection
- Which data preprocessing feature is used to create new features or modify existing ones to improve model performance?
Answer
Feature Engineering
- During the data splitting step, 20% of the dataset was split for testing. What is the percentage weightage avg of precision of spam detection?
Answer
0.98
- How many of the test emails are marked as spam?
Answer
3
- One of the emails that is detected as spam contains a secret code. What is the code?
Answer
I_Hate_BesT_FestiVal
![Screenshot 2024-01-05 at 5 14 24 PM](https://private-user-images.githubusercontent.com/44930131/294480252-4aa26534-aa8c-4d02-a319-e4b16efe078f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNDI1NDIsIm5iZiI6MTczOTE0MjI0MiwicGF0aCI6Ii80NDkzMDEzMS8yOTQ0ODAyNTItNGFhMjY1MzQtYWE4Yy00ZDAyLWEzMTktZTRiMTZlZmUwNzhmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDIzMDQwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThmZmNlOWU3N2VkODIzODJkOGI2NjA3ZDcwODIwMzNlNzE4NWY3OTIxZTk2MjhhYjI0MTcyMTYxMGRlMmQ1YzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.b9dEc_njpr6NlFGYv55FrbX7_XP0CDloVDPWflyf-DU)