Skip to content

Building and deploying a convolutional neural network (CNN) for accurately classifying landmarks from images using both custom CNNs and transfer learning techniques.

Notifications You must be signed in to change notification settings

paschalugwu/Landmark-Classification

Repository files navigation

Comprehensive Report on Landmark Classification Using Convolutional Neural Networks

Introduction

Project Overview

This project involves building and deploying a convolutional neural network (CNN) for landmark classification. The goal is to create a model that can accurately identify various landmarks from images. This project is significant as it demonstrates the power of deep learning in image recognition tasks, a key area of computer vision with numerous applications, including tourism, education, and augmented reality.

Personal Motivation

I chose this project due to my interest in deep learning and its applications in image processing. My background in data science and machine learning has equipped me with the skills needed to tackle this challenge. This project aligns with my career goals of becoming a proficient data scientist with a specialization in computer vision. It also provides an opportunity to apply theoretical knowledge in a practical setting, enhancing my problem-solving skills.

Methodology

Data Collection and Preparation

The dataset used for this project was provided by Udacity as part of their AWS Fundamentals of Machine Learning Scholarship Program. It includes images of various landmarks from around the world.

  • Data Sources: The primary data source is the Udacity dataset, which contains labeled images of landmarks.
  • Data Collection: The dataset was downloaded from the Udacity platform.
  • Challenges: Handling a large number of images and ensuring data quality were significant challenges.
  • Data Cleaning and Preprocessing: This involved resizing images, normalizing pixel values, and augmenting data to increase the diversity of the training set. Missing values were handled by ensuring all images met the required format and dimensions.

Exploratory Data Analysis (EDA)

EDA was performed to understand the distribution and characteristics of the dataset. Key insights included:

  • Class Distribution: The number of images per landmark varied, necessitating techniques to handle class imbalance.
  • Image Characteristics: Variability in image quality and dimensions was observed.
  • Visualizations: Histograms and bar plots were used to depict class distributions and image properties.

Modeling and Implementation

Model Selection

Two primary approaches were considered:

  1. CNN from Scratch: Building a custom CNN architecture tailored to the dataset.
  2. Transfer Learning: Using pre-trained models like ResNet and fine-tuning them for landmark classification.

The final model chosen was a combination of both approaches:

  • Custom CNN: To understand the basics of CNNs and their architecture.
  • Transfer Learning: To leverage the power of pre-trained models for better accuracy and efficiency.

The models were trained using PyTorch, with hyperparameter tuning and validation performed to optimize performance.

Implementation Details

The models were implemented using PyTorch and various other libraries:

  • Custom CNN: A sequential model with multiple convolutional, pooling, and dense layers.
  • Transfer Learning: Fine-tuning a pre-trained ResNet model.
  • Training: The training process involved using cross-entropy loss and the Adam optimizer.
  • Validation: A separate validation set was used to monitor model performance and prevent overfitting.

Key code snippets include:

# Custom CNN architecture
class CustomCNN(nn.Module):
    def __init__(self):
        super(CustomCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.fc1 = nn.Linear(64*6*6, 128)
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 64*6*6)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

# Transfer Learning with ResNet
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)

Results and Evaluation

Model Performance

Performance metrics used included accuracy, precision, recall, and F1-score. The models were evaluated on a test set:

  • Custom CNN: Achieved an accuracy of 53%.
  • Transfer Learning (ResNet): Achieved an accuracy of 76%.

Visualizations such as confusion matrices and ROC curves were used to analyze model performance.

Business Impact

The model's performance has practical implications:

  • Tourism Apps: Can help tourists identify landmarks and provide information about them.
  • Education: Useful in educational tools for learning about world landmarks.
  • Augmented Reality: Enhances AR applications by recognizing landmarks in real-time.

Challenges and Solutions

Obstacles Encountered

  • Data Imbalance: Addressed using data augmentation techniques.
  • Training Time: High computational cost mitigated by using cloud-based GPU resources.
  • Model Overfitting: Handled by using regularization techniques and early stopping.

Solutions

  • Data Augmentation: Techniques such as rotation, scaling, and flipping were used.
  • Regularization: Dropout layers were added to the models.
  • Early Stopping: Implemented to stop training when validation loss stopped improving.

Conclusion and Future Work

Project Summary

The project successfully built and deployed a landmark classification model using CNNs. The transfer learning approach provided better accuracy and efficiency. The final model can classify landmarks with high accuracy, demonstrating the power of deep learning in image recognition tasks.

Future Improvements

  • More Data: Incorporating more diverse datasets to improve model robustness.
  • Advanced Models: Experimenting with more advanced architectures like EfficientNet.
  • Real-Time Deployment: Developing a mobile app for real-time landmark recognition.

Personal Reflection

Skills and Growth

This project enhanced my understanding of CNNs, transfer learning, and their applications in computer vision. It also improved my skills in data preprocessing, model training, and performance evaluation. The experience has contributed significantly to my professional development, preparing me for more advanced roles in data science and machine learning.

Conclusion

This project has reinforced my passion for deep learning and computer vision. I am grateful for the support from my mentors and peers. I look forward to applying these skills in future projects and continuing my journey in the field of data science.

Attachments and References

Supporting Documents

References

  • Udacity AWS Fundamentals of Machine Learning Scholarship Program
  • PyTorch Documentation
  • Research papers on CNN and Transfer Learning

About

Building and deploying a convolutional neural network (CNN) for accurately classifying landmarks from images using both custom CNNs and transfer learning techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published