Welcome to my GitHub profile! I’m a passionate Data Scientist and Machine Learning Innovator with a strong foundation in Data Science, complemented by a love for finance & healthcare research. Here, you’ll find my work at the intersection of AI, machine learning, and analytics.
-
Johns Hopkins University
Master of Science in Data Science (Expected: 2026)
Relevant Courses: Advanced Data Science, Machine Translation, ML for Healthcare -
LM Thapar School of Management
MBA in Finance (2024)
Relevant Courses: Financial Derivatives, Options Pricing, Portfolio Management -
Thapar University
Bachelor’s in Computer Science (2023)
Relevant Courses: Deep Learning, Probability and Statistics, Algorithm Design
November 2024 – Present | Baltimore, USA
Engineered an ETL pipeline for processing unstructured SEC filings (10-K & 10-Q) of BDCs, applying advanced data science techniques for financial forecasting.
- Streamlined Data Analysis: Designed Python and Stata workflows for data wrangling, feature engineering, and regex classification to convert financial data into machine-readable formats.
- Predictive Modeling: Prepared datasets for financial time-series forecasting, leveraging ARIMA models for trend prediction and actionable insights.
February 2024 – July 2024 | Noida, India
Designed and implemented customer insight pipelines to enhance product development strategies.
- Web Scraping Expertise: Built web scraping solutions using Selenium and Beautiful Soup to extract Voice of Customer data from platforms like YouTube and Flipkart.
- NLP Innovations: Fine-tuned BERT for sentiment analysis and utilized Latent Dirichlet Allocation (LDA) for topic modeling, transforming customer feedback into actionable insights.
June 2023 – August 2023 | Mumbai, India
Harnessed big data technologies to optimize marketing strategies for India’s leading telecom provider.
- ETL Pipeline Development: Designed a scalable pipeline using Apache Spark to process over 1M tweets, integrating MongoDB for efficient storage.
- Predictive Analytics: Applied Naive Bayes for sentiment classification and SVM for engagement prediction, visualizing results in Tableau to boost campaign effectiveness by 15%.
June 2022 – May 2023 | Patiala, India
Pioneered a deep learning-based waste classification system, contributing to environmental sustainability.
- Innovative Edge Processing: Developed a waste classification boat using ResNet and TensorFlow, achieving 93.07% accuracy in real-time classification of biodegradable vs. non-biodegradable waste.
- Deployment Ready: Deployed the system on Raspberry Pi for efficient edge processing with live camera input.
An innovative AI-powered solution for environmental management, featuring a patented system for waste classification.
- Smart Waste Management: Leveraged TensorFlow and ResNet on Raspberry Pi to accurately distinguish between biodegradable and non-biodegradable waste, achieving 93.07% classification accuracy.
- Impact in Action: This prototype advances environmental sustainability by improving waste segregation processes in water bodies.
MelanoViT | GitHub
A robust melanoma classification framework leveraging state-of-the-art Vision Transformers (ViT) and DinoV2.
- Breakthrough Performance: Achieved 99.03% accuracy with metadata integration, surpassing traditional CNN-based models.
- Advanced Techniques: Addressed class imbalance with weighted loss functions, enhanced generalization through data augmentation (hair removal, geometric transformations).
Pandemic Tracker | GitHub | Interactive Website
A comprehensive COVID-19 global dashboard designed for dynamic data exploration.
- Time-Series Analysis: Utilized Python, Pandas, and Plotly for rolling averages and trend visualizations.
- Deployed Application: Interactive Streamlit app featuring choropleths, bar graphs, line plots, and filters to enable detailed insights into pandemic trends.
EchoTranslate | GitHub
A Transformer-based pipeline for ASR and multilingual translation targeting medical applications.
- Domain-Specific Excellence: Preprocessed and fine-tuned on a medical corpus, achieving a 5.8% BLEU score improvement and 12.5% WER reduction.
- Advanced Features: Incorporated Torchaudio and Librosa for spectrogram extraction, signal denoising, and preprocessing in doctor-patient conversations.
- Programming Languages: Python, MySQL, R, C++, Java, JavaScript
- Technical Skills and Tools:
- Data Filtration: NumPy, Pandas, OS, Scikit-Learn, SciPy, Datasets, LIWC, NLTK
- Web Scraping: Selenium, BeautifulSoup
- Model Building and Training: PyTorch, Transformer, WandB, SageMaker, SpaCy, Flair, TensorFlow, OpenCV, Amazon Lex, XGBoost
- Data Visualization: Matplotlib, Seaborn, Tableau
- Software & Frameworks: PowerBI, Spark, Hadoop, AWS, Azure, Apache Spark, Docker, Microsoft Excel, Figma
- Competencies: Machine Learning, Generative AI, Feature Engineering, Deep Learning, Data Analysis, Financial Derivatives
- Millennium Fellowship 2021 by United Nations Academic Impact
- Head of Administration for IAESTE, TIET, India Chapter
- Top 5 Finalist in Microsoft Learn Student Chapter Hackathon at Thapar University
Whether you're interested in discussing a project, exploring collaboration opportunities, or simply want to chat about data science, feel free to reach out on LinkedIn or GitHub.