Exercise 1: Data Exploration in R with Iris Dataset. Correlation, Pairwise Visualization, and PCA.
Exercise 2: Clustering with Iris Dataset. K-Means & Heirarchial Clustering
Exercise 3: Classification wtih Iris and Iris3 Datasets. Decision Tree & K Nearest Neighbor (KNN)
Exercise 4: Testing overfitting on regression models
Exercise 5: Diagnose Breast Cancer with Decision Trees
Exercise 1: Association rule mining with Apiori and Eclat Algorithms on Last.fm dataset
Exercise 2: Association rule visualization on Last.fm dataset
Exercise 1: Recommender systems using movie ratings from MovieLense
Exercise 2: Crime Analysis from City of Chicago's Data Portal
Exercise 1: Predict Futbol match outcomes using Random Forests, Multilayer Perceptron Neural Network (MLP), K-Nearest Neighbor Classifiers, Naive Bayesian Classifier, and Multinomial Logisitc Regression (MLogit Regression).
Exercise 1: Efficiently working in R
Exercise 2: Databases in R: SQL, MySQL, and NoSQL.
Based on lab in Data Mining course of Machine Learning and Data Mining (MLDM) Master's Program and University Jean Monnet. Lab taught by Dr. Fabrice Muhlenbach.
Wisconsin Breast Cancer Dataset provided by the Machine Learning Repository by University California Irvine.
LastFM data made available by Ledolter & Wiley at University of Iowa.
Link: https://www.biz.uiowa.edu/faculty/jledolter/DataMining/datatext.html
Crime analysis data made available by City of Chicago Data Portal
Link to data portal: https://data.cityofchicago.org/
Link to data set: https://data.cityofchicago.org/api/views/6zsd-86xi/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*
Movie data provided by MovieLense.
Futbol Data Provided by Panini Digital futbol database for 2010-20100 Serie A Season. Panini Digital is a leader in the collection of statistical data on futbol, providing data services to clubs and the media. The fubtol database contains detailed information about plays made during each match (free kicks and shots, fouls, crosses, recovered balls, goal assists, average time of ball possessio, saves goals on free kicks, etc). www.paninidigital.com
Code compiling section of lab 5 exercise 1 is based on:
Gillespie, C. and R. Lovelace (2017). Efficient R Programming – A Practical Guide to Smarter Programming. O’Reilly.
MLDM Program Webpage: http://mldm.univ-st-etienne.fr/