Detects the spam SMS/emails by using Machine Learning Algorithms
Designing and developing a crowd-sourcing based solution that can analyse and verify the source of any SMS and Email based on the inputs from the end-users. We will filter out spam emails by using Machine Learning Model based on Naïve Bayes Algorithm.
Technology Used: - Python Machine Learning Matplotlib Pyplot Connectors Streamlit
Process involved in creating a model:
- Data cleaning :Pre-processing the data is essential before training
- Data Pre-processing : It’s crucial to eliminate any unnecessary or missing data from the data before training . Punctuation marks, special letters, and stop words — common words like “the” and “an” that don’t have much meaning — can all be eliminated in this process.
- Tokenization: Tokenization is the process of dividing the text into smaller pieces that the ML can process, such as words or letters. This is significant because text input into machines must first be transformed into a number representation because MLs can only interpret numerical data.
- Vocabulary creation: Tokenizing the text results in the construction of a vocabulary, which consists of original words or characters. Each word or character is converted to a distinct numerical value using this vocabulary before being entered into the machine.
- Data normalization: To ensure that the machine can handle the values efficiently, the data are scaled to a particular range, such as 0 to 1. This is especially crucial for data that has a wide range of numerical values, such text data with both short and long words.
- Data balancing: It is necessary to prevent the network from favouring the class with more samples by maintaining a balance between the quantity of samples for each class.
How we can improve :
- Use multiple detection methods
- Train the system with diverse data
- Keep the system up-to-date
- Use user feedback
- Implement strong access controls
- Monitor the system
- Encrypt data
THANK YOU!