Skip to content

amschwinn/data_mining_lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining Lab (MLDM Grad Program)

Authors: Austin Schwinn

Dates: Jan - Mar 2017

Subject: Repository for my grad school data mining lab

Lab 1:

Exercise 1: Data Exploration in R with Iris Dataset. Correlation, Pairwise Visualization, and PCA.
Exercise 2: Clustering with Iris Dataset. K-Means & Heirarchial Clustering
Exercise 3: Classification wtih Iris and Iris3 Datasets. Decision Tree & K Nearest Neighbor (KNN)
Exercise 4: Testing overfitting on regression models
Exercise 5: Diagnose Breast Cancer with Decision Trees

Lab 2:

Exercise 1: Association rule mining with Apiori and Eclat Algorithms on Last.fm dataset
Exercise 2: Association rule visualization on Last.fm dataset

Lab 3:

Exercise 1: Recommender systems using movie ratings from MovieLense
Exercise 2: Crime Analysis from City of Chicago's Data Portal

Lab 4:

Exercise 1: Predict Futbol match outcomes using Random Forests, Multilayer Perceptron Neural Network (MLP), K-Nearest Neighbor Classifiers, Naive Bayesian Classifier, and Multinomial Logisitc Regression (MLogit Regression).

Lab 5:

Exercise 1: Efficiently working in R
Exercise 2: Databases in R: SQL, MySQL, and NoSQL.

References:

Based on lab in Data Mining course of Machine Learning and Data Mining (MLDM) Master's Program and University Jean Monnet. Lab taught by Dr. Fabrice Muhlenbach.

Wisconsin Breast Cancer Dataset provided by the Machine Learning Repository by University California Irvine.

LastFM data made available by Ledolter & Wiley at University of Iowa.
	Link: https://www.biz.uiowa.edu/faculty/jledolter/DataMining/datatext.html

Crime analysis data made available by City of Chicago Data Portal
	Link to data portal: https://data.cityofchicago.org/
	Link to data set: https://data.cityofchicago.org/api/views/6zsd-86xi/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*

Movie data provided by MovieLense.

Futbol Data Provided by Panini Digital futbol database for 2010-20100 Serie A Season. Panini Digital is a leader in the collection of statistical data on futbol, providing data services to clubs and the media. The fubtol database contains detailed information about plays made during each match (free kicks and shots, fouls, crosses, recovered balls, goal assists, average time of ball possessio, saves goals on free kicks, etc). www.paninidigital.com

Code compiling section of lab 5 exercise 1 is based on: 
	Gillespie, C. and R. Lovelace (2017). Efficient R Programming – A Practical Guide to Smarter Programming. O’Reilly.

MLDM Program Webpage: http://mldm.univ-st-etienne.fr/

About

Repository for my grad school data mining lab

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages