In this exercise session, we consolidated our machine learning (ML) modelling skills by using a popular classification model to predict taxi tips. The model used is Decision Tree
. We used a real dataset to train each of these models. The dataset includes information about taxi tips and was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). We used the trained model to predict the amount of tips paid.
In the current exercise session, we practised the Scikit-Learn Python interface and the Python API offered by the Snap Machine Learning (Snap ML) library.
Scikit-learn is a free, open-source and popular machine-learning library for Python. It features various classification, regression and clustering algorithms including support-vector machines, random forests, and k-means. It is easy to use and has a well-documented API. It is constantly being updated with new features and algorithms. This makes it a valuable tool for data scientists and machine learning practitioners.
Snap ML is a high-performance IBM library for ML modelling. It provided highly-efficient CPU/GPU implementations of linear models and tree-based models. Snap ML not only accelerated ML algorithms through system awareness, but it also offered novel ML algorithms with best-in-class accuracy.
It was exciting to learn how to use these two popular classification models to detect fraudulent credit card transactions. We believe that this knowledge would be valuable in our future careers as data scientists. We also looked forward to practising the Scikit-Learn Python interface and the Snap ML library. We were confident this would help us become more proficient ML modellers. For more information, please visit snapml information page and SciKit-Learn information page.
- Perform basic data preprocessing using Scikit-Learn
- Model a regression task using the Scikit-Learn and Snap ML Python APIs
- Train a Decision Tree Regressor model using Scikit-Learn and Snap ML
- Run inference and assess the quality of the trained models
You can download the dataset form here.