This project involves building and deploying a convolutional neural network (CNN) for landmark classification. The goal is to create a model that can accurately identify various landmarks from images. This project is significant as it demonstrates the power of deep learning in image recognition tasks, a key area of computer vision with numerous applications, including tourism, education, and augmented reality.
I chose this project due to my interest in deep learning and its applications in image processing. My background in data science and machine learning has equipped me with the skills needed to tackle this challenge. This project aligns with my career goals of becoming a proficient data scientist with a specialization in computer vision. It also provides an opportunity to apply theoretical knowledge in a practical setting, enhancing my problem-solving skills.
The dataset used for this project was provided by Udacity as part of their AWS Fundamentals of Machine Learning Scholarship Program. It includes images of various landmarks from around the world.
- Data Sources: The primary data source is the Udacity dataset, which contains labeled images of landmarks.
- Data Collection: The dataset was downloaded from the Udacity platform.
- Challenges: Handling a large number of images and ensuring data quality were significant challenges.
- Data Cleaning and Preprocessing: This involved resizing images, normalizing pixel values, and augmenting data to increase the diversity of the training set. Missing values were handled by ensuring all images met the required format and dimensions.
EDA was performed to understand the distribution and characteristics of the dataset. Key insights included:
- Class Distribution: The number of images per landmark varied, necessitating techniques to handle class imbalance.
- Image Characteristics: Variability in image quality and dimensions was observed.
- Visualizations: Histograms and bar plots were used to depict class distributions and image properties.
Two primary approaches were considered:
- CNN from Scratch: Building a custom CNN architecture tailored to the dataset.
- Transfer Learning: Using pre-trained models like ResNet and fine-tuning them for landmark classification.
The final model chosen was a combination of both approaches:
- Custom CNN: To understand the basics of CNNs and their architecture.
- Transfer Learning: To leverage the power of pre-trained models for better accuracy and efficiency.
The models were trained using PyTorch, with hyperparameter tuning and validation performed to optimize performance.
The models were implemented using PyTorch and various other libraries:
- Custom CNN: A sequential model with multiple convolutional, pooling, and dense layers.
- Transfer Learning: Fine-tuning a pre-trained ResNet model.
- Training: The training process involved using cross-entropy loss and the Adam optimizer.
- Validation: A separate validation set was used to monitor model performance and prevent overfitting.
Key code snippets include:
# Custom CNN architecture
class CustomCNN(nn.Module):
def __init__(self):
super(CustomCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.fc1 = nn.Linear(64*6*6, 128)
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2)
x = x.view(-1, 64*6*6)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# Transfer Learning with ResNet
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)
Performance metrics used included accuracy, precision, recall, and F1-score. The models were evaluated on a test set:
- Custom CNN: Achieved an accuracy of 53%.
- Transfer Learning (ResNet): Achieved an accuracy of 76%.
Visualizations such as confusion matrices and ROC curves were used to analyze model performance.
The model's performance has practical implications:
- Tourism Apps: Can help tourists identify landmarks and provide information about them.
- Education: Useful in educational tools for learning about world landmarks.
- Augmented Reality: Enhances AR applications by recognizing landmarks in real-time.
- Data Imbalance: Addressed using data augmentation techniques.
- Training Time: High computational cost mitigated by using cloud-based GPU resources.
- Model Overfitting: Handled by using regularization techniques and early stopping.
- Data Augmentation: Techniques such as rotation, scaling, and flipping were used.
- Regularization: Dropout layers were added to the models.
- Early Stopping: Implemented to stop training when validation loss stopped improving.
The project successfully built and deployed a landmark classification model using CNNs. The transfer learning approach provided better accuracy and efficiency. The final model can classify landmarks with high accuracy, demonstrating the power of deep learning in image recognition tasks.
- More Data: Incorporating more diverse datasets to improve model robustness.
- Advanced Models: Experimenting with more advanced architectures like EfficientNet.
- Real-Time Deployment: Developing a mobile app for real-time landmark recognition.
This project enhanced my understanding of CNNs, transfer learning, and their applications in computer vision. It also improved my skills in data preprocessing, model training, and performance evaluation. The experience has contributed significantly to my professional development, preparing me for more advanced roles in data science and machine learning.
This project has reinforced my passion for deep learning and computer vision. I am grateful for the support from my mentors and peers. I look forward to applying these skills in future projects and continuing my journey in the field of data science.
- Code: GitHub Repository
- Data Files: Provided upon request.
- Udacity AWS Fundamentals of Machine Learning Scholarship Program
- PyTorch Documentation
- Research papers on CNN and Transfer Learning