diff --git a/code/README.md b/code/README.md new file mode 100644 index 00000000..d80ccea0 --- /dev/null +++ b/code/README.md @@ -0,0 +1,28 @@ +## Table of Contents and Code Notebooks + + +Simply click on the `ipynb`/`nbviewer` links next to the chapter headlines to view the code examples (currently, the internal document links are only supported by the NbViewer version). +**Please note that these are just the code examples accompanying the book, which I uploaded for your convenience; be aware that these notebooks may not be useful without the formulae and descriptive text.** + + +1. Machine Learning - Giving Computers the Ability to Learn from Data [[dir](./ch01)] [[ipynb](./ch01/ch01.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch01/ch01.ipynb)] +2. Training Machine Learning Algorithms for Classification [[dir](./ch02)] [[ipynb](./ch02/ch02.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch02/ch02.ipynb)] +3. A Tour of Machine Learning Classifiers Using Scikit-Learn [[dir](./ch03)] [[ipynb](./ch03/ch03.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch03/ch03.ipynb)] +4. Building Good Training Sets – Data Pre-Processing [[dir](./ch04)] [[ipynb](./ch04/ch04.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch04/ch04.ipynb)] +5. Compressing Data via Dimensionality Reduction [[dir](./ch05)] [[ipynb](./ch05/ch05.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch05/ch05.ipynb)] +6. Learning Best Practices for Model Evaluation and Hyperparameter Optimization [[dir](./ch06)] [[ipynb](./ch06/ch06.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch06/ch06.ipynb)] +7. Combining Different Models for Ensemble Learning [[dir](./ch07)] [[ipynb](./ch07/ch07.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch07/ch07.ipynb)] +8. Applying Machine Learning to Sentiment Analysis [[dir](./ch08)] [[ipynb](./ch08/ch08.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch08/ch08.ipynb)] +9. Embedding a Machine Learning Model into a Web Application [[dir](./ch09)] [[ipynb](./ch09/ch09.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch09/ch09.ipynb)] +10. Predicting Continuous Target Variables with Regression Analysis [[dir](./ch10)] [[ipynb](./ch10/ch10.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch10/ch10.ipynb)] +11. Working with Unlabeled Data – Clustering Analysis [[dir](./ch11)] [[ipynb](./ch11/ch11.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch11/ch11.ipynb)] +12. Training Artificial Neural Networks for Image Recognition [[dir](./ch12)] [[ipynb](./ch12/ch12.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch12/ch12.ipynb)] +13. Parallelizing Neural Network Training via Theano [[dir](./ch13)] [[ipynb](.2/ch12.ipynb)] +13. Parallelizing Neural Network Training via Theano [[dir](./ch13)] [[ipynb](./ch13/ch13.ipynb)] [[nbviewer](http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch13/ch13.ipynb)] + +## Contact + +I am happy to answer questions! Just write me an [email](mailto:mail@sebastianraschka.com) +or consider asking the question on the [Google Groups Email List](https://groups.google.com/forum/#!forum/python-machine-learning-book). + +If you are interested in keeping in touch, I have quite a lively twitter stream ([@rasbt](https://twitter.com/rasbt)) all about data science and machine learning. I also maintain a [blog](http://sebastianraschka.com/articles.html) where I post all of the things I am particularly excited about. diff --git a/code/_convenience_scripts/md_toc.py b/code/_convenience_scripts/md_toc.py new file mode 100644 index 00000000..cd678dc3 --- /dev/null +++ b/code/_convenience_scripts/md_toc.py @@ -0,0 +1,13 @@ +# Sebastian Raschka, 2015 +# convenience function for myself to create nested TOC lists +# use as `python md_toc.py /blank_tocs/ch01.toc` + +import sys + +ipynb = sys.argv[1] +with open(ipynb, 'r') as f: + for line in f: + out_str = ' ' * (len(line) - len(line.lstrip())) + line = line.strip() + out_str += '- %s' % line + print(out_str) diff --git a/code/ch01/README.md b/code/ch01/README.md index 6d3c91d9..4c443773 100644 --- a/code/ch01/README.md +++ b/code/ch01/README.md @@ -1,16 +1,35 @@ Sebastian Raschka, 2015 -# Python Machine Learning +Python Machine Learning - Code Examples -# Chapter 1 Code Examples -## Giving Computers the Ability to Learn from Data +## Chapter 1 - Giving Computers the Ability to Learn from Data -
+- Building intelligent machines to transform data into knowledge +- The three different types of machine learning + - Making predictions about the future with supervised learning + - Classification for predicting class labels + - Regression for predicting continuous outcomes + - Solving interactive problems with reinforcement learning + - Discovering hidden structures with unsupervised learning + - Finding subgroups with clustering + - Dimensionality reduction for data compression +- An introduction to the basic terminology and notations +- A roadmap for building machine learning systems + - Preprocessing – getting data into shape + - Training and selecting a predictive model + - Evaluating models and predicting unseen data instances +- Using Python for machine learning + - Installing Python packages +- Summary + + + +--- **Chapter 1 does not contain any code examples.** -
+--- ## Installing Python packages diff --git a/code/ch02/README.md b/code/ch02/README.md index 045cd558..c4e0ccc0 100644 --- a/code/ch02/README.md +++ b/code/ch02/README.md @@ -1,6 +1,15 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 2 Code Examples +Python Machine Learning - Code Examples -## Training Machine Learning Algorithms for Classification \ No newline at end of file +## Chapter 2 - Training Machine Learning Algorithms for Classification + +- Artificial neurons - a brief glimpse into the early history +- of machine learning +- Implementing a perceptron learning algorithm in Python + - Training a perceptron model on the Iris dataset +- Adaptive linear neurons and the convergence of learning + - Minimizing cost functions with gradient descent + - Implementing an Adaptive Linear Neuron in Python + - Large scale machine learning and stochastic gradient descent +- Summary \ No newline at end of file diff --git a/code/ch03/README.md b/code/ch03/README.md index 2a9fe4b9..c0e3f078 100644 --- a/code/ch03/README.md +++ b/code/ch03/README.md @@ -1,6 +1,27 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 3 Code Examples +Python Machine Learning - Code Examples -## A Tour of Machine Learning Classifiers Using Scikit-learn \ No newline at end of file + +## Chapter 3 - A Tour of Machine Learning Classifiers Using Scikit-learn + +- Choosing a classification algorithm +- First steps with scikit-learn + - Training a perceptron via scikit-learn +- Modeling class probabilities via logistic regression + - Logistic regression intuition and conditional probabilities + - Learning the weights of the logistic cost function + - Training a logistic regression model with scikit-learn + - Tackling overfitting via regularization +- Maximum margin classification with support vector machines + - Maximum margin intuition + - Dealing with the nonlinearly separable case using slack variables + - Alternative implementations in scikit-learn +- Solving nonlinear problems using a kernel SVM + - Using the kernel trick to find separating hyperplanes in higher dimensional space +- Decision tree learning + - Maximizing information gain – getting the most bang for the buck + - Building a decision tree + - Combining weak to strong learners via random forests +- K-nearest neighbors – a lazy learning algorithm +- Summary \ No newline at end of file diff --git a/code/ch04/README.md b/code/ch04/README.md index d9e2e06a..62a96c76 100644 --- a/code/ch04/README.md +++ b/code/ch04/README.md @@ -1,6 +1,21 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 4 Code Examples +Python Machine Learning - Code Examples -## Building Good Training Sets – Data Preprocessing \ No newline at end of file +## Chapter 4 - Building Good Training Sets – Data Preprocessing + +- Dealing with missing data + - Eliminating samples or features with missing values + - Imputing missing values + - Understanding the scikit-learn estimator API +- Handling categorical data + - Mapping ordinal features + - Encoding class labels + - Performing one-hot encoding on nominal features +- Partitioning a dataset in training and test sets +- Bringing features onto the same scale +- Selecting meaningful features + - Sparse solutions with L1 regularization + - Sequential feature selection algorithms +- Assessing feature importance with random forests +- Summary \ No newline at end of file diff --git a/code/ch05/README.md b/code/ch05/README.md index 50e81dbe..6215bf79 100644 --- a/code/ch05/README.md +++ b/code/ch05/README.md @@ -1,6 +1,23 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 5 Code Examples +Python Machine Learning - Code Examples -## Compressing Data via Dimensionality Reduction +## Chapter 5 - Compressing Data via Dimensionality Reduction + +- Unsupervised dimensionality reduction via principal component analysis 128 + - Total and explained variance + - Feature transformation + - Principal component analysis in scikit-learn +- Supervised data compression via linear discriminant analysis + - Computing the scatter matrices + - Selecting linear discriminants for the new feature subspace + - Projecting samples onto the new feature space + - LDA via scikit-learn +- Using kernel principal component analysis for nonlinear mappings + - Kernel functions and the kernel trick + - Implementing a kernel principal component analysis in Python + - Example 1 – separating half-moon shapes + - Example 2 – separating concentric circles + - Projecting new data points + - Kernel principal component analysis in scikit-learn +- Summary \ No newline at end of file diff --git a/code/ch06/README.md b/code/ch06/README.md index 612793b3..212610a8 100644 --- a/code/ch06/README.md +++ b/code/ch06/README.md @@ -1,9 +1,25 @@ Sebastian Raschka, 2015 -# Python Machine Learning +Python Machine Learning - Code Examples -# Chapter 6 Code Examples - -## Learning Best Practices for Model Evaluation and Hyperparameter Tuning +## Chapter 6 - Learning Best Practices for Model Evaluation and Hyperparameter Tuning +- Streamlining workflows with pipelines + - Loading the Breast Cancer Wisconsin dataset + - Combining transformers and estimators in a pipeline +- Using k-fold cross-validation to assess model performance + - The holdout method + - K-fold cross-validation +- Debugging algorithms with learning and validation curves + - Diagnosing bias and variance problems with learning curves + - Addressing overfitting and underfitting with validation curves +- Fine-tuning machine learning models via grid search + - Tuning hyperparameters via grid search + - Algorithm selection with nested cross-validation +- Looking at different performance evaluation metrics + - Reading a confusion matrix + - Optimizing the precision and recall of a classification model + - Plotting a receiver operating characteristic + - The scoring metrics for multiclass classification +- Summary diff --git a/code/ch07/README.md b/code/ch07/README.md index 566eba76..8ddbdb27 100644 --- a/code/ch07/README.md +++ b/code/ch07/README.md @@ -1,6 +1,14 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 7 Code Examples +Python Machine Learning - Code Examples -## Combining Different Models for Ensemble Learning +## Chapter 7 - Combining Different Models for Ensemble Learning + + +- Learning with ensembles +- Implementing a simple majority vote classifier + - Combining different algorithms for classification with majority vote +- Evaluating and tuning the ensemble classifier +- Bagging – building an ensemble of classifiers from bootstrap samples +- Leveraging weak learners via adaptive boosting +- Summary \ No newline at end of file diff --git a/code/ch08/README.md b/code/ch08/README.md index d6d0e550..7bfe8234 100644 --- a/code/ch08/README.md +++ b/code/ch08/README.md @@ -1,6 +1,15 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 8 Code Examples +Python Machine Learning - Code Examples -## Applying Machine Learning to Sentiment Analysis \ No newline at end of file +## Chapter 8 - Applying Machine Learning to Sentiment Analysis + +- Obtaining the IMDb movie review dataset +- Introducing the bag-of-words model + - Transforming words into feature vectors + - Assessing word relevancy via term frequency-inverse document frequency + - Cleaning text data + - Processing documents into tokens +- Training a logistic regression model for document classification +- Working with bigger data – online algorithms and out-of-core learning +- Summary \ No newline at end of file diff --git a/code/ch09/README.md b/code/ch09/README.md index 1b9a753d..2083bc58 100644 --- a/code/ch09/README.md +++ b/code/ch09/README.md @@ -1,10 +1,20 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 9 Code Examples +Python Machine Learning - Code Examples -## Embedding a Machine Learning Model into a Web Application +## Chapter 9 - Embedding a Machine Learning Model into a Web Application +- Serializing fitted scikit-learn estimators +- Setting up a SQLite database for data storage +- Developing a web application with Flask +- Our first Flask web application + - Form validation and rendering + - Turning the movie classifier into a web application +- Deploying the web application to a public server + - Updating the movie review classifier +- Summary + +--- The code for the Flask web applications can be found in the following directories: diff --git a/code/ch10/README.md b/code/ch10/README.md index 1632dc89..15a3a88f 100644 --- a/code/ch10/README.md +++ b/code/ch10/README.md @@ -1,6 +1,21 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 10 Code Examples +Python Machine Learning - Code Examples -## Predicting Continuous Target Variables with Regression Analysis +## Chapter 10 - Predicting Continuous Target Variables with Regression Analysis + +- Introducing a simple linear regression model +- Exploring the Housing Dataset + - Visualizing the important characteristics of a dataset +- Implementing an ordinary least squares linear regression model + - Solving regression for regression parameters with gradient descent + - Estimating the coefficient of a regression model via scikit-learn +- Fitting a robust regression model using RANSAC +- Evaluating the performance of linear regression models +- Using regularized methods for regression +- Turning a linear regression model into a curve – polynomial regression + - Modeling nonlinear relationships in the Housing Dataset + - Dealing with nonlinear relationships using random forests + - Decision tree regression + - Random forest regression +- Summary \ No newline at end of file diff --git a/code/ch11/README.md b/code/ch11/README.md index e3df0de3..dc8b2b1e 100644 --- a/code/ch11/README.md +++ b/code/ch11/README.md @@ -1,6 +1,17 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 11 Code Examples +Python Machine Learning - Code Examples -## Working with Unlabeled Data – Clustering Analysis \ No newline at end of file +## Chapter 11 - Working with Unlabeled Data – Clustering Analysis + +- Grouping objects by similarity using k-means + - K-means++ + - Hard versus soft clustering + - Using the elbow method to find the optimal number of clusters + - Quantifying the quality of clustering via silhouette plots +- Organizing clusters as a hierarchical tree + - Performing hierarchical clustering on a distance matrix + - Attaching dendrograms to a heat map + - Applying agglomerative clustering via scikit-learn +- Locating regions of high density via DBSCAN +- Summary \ No newline at end of file diff --git a/code/ch12/README.md b/code/ch12/README.md index 67e3a7c3..6f9e97c8 100644 --- a/code/ch12/README.md +++ b/code/ch12/README.md @@ -1,6 +1,24 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 12 Code Examples +Python Machine Learning - Code Examples -## Training Artificial Neural Networks for Image \ No newline at end of file +## Chapter 12 - Training Artificial Neural Networks for Image + +- Modeling complex functions with artificial neural networks + - Single-layer neural network recap + - Introducing the multi-layer neural network architecture + - Activating a neural network via forward propagation +- Classifying handwritten digits + - Obtaining the MNIST dataset + - Implementing a multi-layer perceptron +- Training an artificial neural network + - Computing the logistic cost function + - Training neural networks via backpropagation +- Developing your intuition for backpropagation +- Debugging neural networks with gradient checking +- Convergence in neural networks +- Other neural network architectures + - Convolutional Neural Networks + - Recurrent Neural Networks +- A few last words about neural network implementation +- Summary \ No newline at end of file diff --git a/code/ch13/README.md b/code/ch13/README.md index e1bde282..416b2f2f 100644 --- a/code/ch13/README.md +++ b/code/ch13/README.md @@ -1,6 +1,18 @@ Sebastian Raschka, 2015 -# Python Machine Learning -# Chapter 13 Code Examples +Python Machine Learning - Code Examples -## Parallelizing Neural Network Training with Theano \ No newline at end of file +## Chapter 13 - Parallelizing Neural Network Training with Theano + +- Building, compiling, and running expressions with Theano + - What is Theano? + - First steps with Theano + - Configuring Theano + - Working with array structures + - Wrapping things up – a linear regression example +- Choosing activation functions for feedforward neural networks + - Logistic function recap + - Estimating probabilities in multi-class classification via the softmax function + - Broadening the output spectrum by using a hyperbolic tangent +- Training neural networks efficiently using Keras +- Summary \ No newline at end of file