In this short IAP I will teach concepts and algorithms that are used repeatedly in practical supervised learning. I hope that taking this IAP will encourage you to use machine learning (ML) in your research, and will facilitate reading ML litrature.
Machine learning is a very broad field and naturally I will not be able to cover all aspects/topics/practices of ML. But the good news is that learning only two main algorithms can be sufficient for almost all practical purposes of supervised ML.
- Decision tree based models (i.e. Random Forests and Gradient Boosting Machines), successful mainly for structured data (tabular data)
- Neural networks, successful mainly for unstructured data (such as audio, vision, and natural language), although recently also becoming popular in tabular data (see fastai courses, which I have used repeatedly while preparing this course).
Most of other ML algorithms (that gained popularity at some point during their lifetime), are outdated and are not very useful in most cases.
In this course I will not invest a lot of time on rigorous derivations, proofs and etc. Instead we will use our time to gain some intuition how ML models work and make our hands ''dirty'' with coding. This is very different from a typical academic courses, which are usually very rigorous and invest their time explaining in details every aspect of the material.
Every session will be devided to a teaching session (that will be given with jupyter notebooks - lesson*.ipynb) and a practice session where you will try code things that we learn (basically using machine learning to get predictions for some data set).
The main topics of the sessions are:
- Classification with Random Forest
- Regression with Random forest and XGBoost
- Fully connected Neural Networks (using pytorch)
- Convolutional Neural Networks
- Transfer Learning (using fastai)
The easiest option to get the course material is cloning this git repository. To do so type git clone In order to get all the material for the course I suggest cloning this git repository. To do so type:
git clone https://github.com/yaniyuval/ML1_IAP.git
Alternatively, you can download a zip file.
To install Anaconda please follow the instructions here, since the installation depends on the OS you have, I cannot provide the exact way how you will install it.
After you installed anaconda, please update your conda version before continuing with these instructions. This is done by typing
conda update --all
If you do not want to update all your packages - please read here about other options how to do partial update (not recommended unless you know what you are doing).
I am using Python 3.7.5, and I encourage you to update your python version to be at least 3.6. You can update to the latest python version by typing:
conda update python
To create a new virtual environment called ML_IAP with anaconda type:
conda create --name ML_IAP python=3.7.5
If you want to use a different python version you can change the python version but take into account that I verified that the code in the notebooks runs with this version (I certainly do not recommend any version prior to 3.6).
To activate the environment (you should do this always before opening the course notebooks or when installing packages) type:
conda activate ML_IAP
in older environments you might need to type:
source activate ML_IAP
in order to activate the virtual environment (though if you installed the new anaconda, you won't need it).
If you want to understand what is a virtual environmnet please read here.
If you want more details regarding creating virtual environments, please read here
Here you will install packages that are necessary for the course. Please type:
conda install scikit-learn numpy matplotlib scipy IPython pandas
also type the following commands:
conda install jupyter notebook
Continue by typing
pip install xgboost
pip install pandas_summary
pip install category_encoders
If you have any problems with category_encoders package, try typing:
conda install -c conda-forge category_encoders
pytorch is the package we will use for deep learning. The specific line of code you will run in order to get pytorch, depends on your OS and your python version. In order to understand what should you type, please go here and choose your OS/versions. I installed a version without a GPU (choosing CUDA none) and needed to type:
conda install pytorch torchvision -c pytorch
But this line of code is proper to my mac and you might need to write something else. Note that even if you do have a GPU (which is great!) I will not teach anything that requires it, and all the code was written to run on a CPU.
Since there was a recent new release of pytorch which still has a small issue (which should be solved in a couple of days), I recommend you to type:
pip install "pillow<7"
Although this might be solved in the next few days - see here and might be not necessary.
type:
pip install fastai
if you have problems in installing fastai - please read here
Go to the IAP library (where you cloned the course repository) and type (don't forget to first activate your ML_IAP virtual environment):
jupyter notebook
Now choose a notebook that you want to run.
All through the IAP, we will use Jupyter notebooks. If you are not familiar with Jupyter notebooks, please go over some basic tutorial (e.g., fastai tutorial notebook, general introduction to jupyter notebooks) and make sure you are able to run code on notebooks. I also added to the repository a tutorial notebook (jupyter-intro_RASP.ipynb) which is taken from Stephan's Rasp repository.
to test that your installation of all packages has worked you can run that you can run the single cell in the notebook test_packages.ipynb.
If you get some error, please try to understand which package you did not install properly and try to reinstall it.
The notebooks might still have some small changes, so please clone the repository again before each session
There are numerous data sources to learn ML. During this course I tried to put links to sources that I took content from, and that I find useful. Three amazing sources of knowledge, that I repeatedly use during the IAP are:
- Python Machine Learning (book by Sebastian Rascha). Go to this book when you want to learn about basic concepts of ML.
- Fast.ai courses (1, 2) given by Jeremy Howard which is a world-class ML practitioner. Take these courses if you want the newest and best ML techniques available. Also see initial version of the book Jeremy Howard is writing.
- If you are interested in a basic soft intro to ML - Andrew Ng course at coursera is a good starting point. The course uses matlab (Ahhh) and a bit outdated but Andrew Ng is a great lecturer and the course is great.