faq: Why is logistic regression considered a linear model?

pursuit-of-42 · Feb 11, 2016 · 0222925 · 0222925
1 parent 8a0d57a
commit 0222925
Show file tree

Hide file tree

Showing 6 changed files with 45 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -151,6 +151,7 @@ I have set up a separate library, [`mlxtend`](http://rasbt.github.io/mlxtend/),
 
 ##### Logistic Regression
 
+- [Why is logistic regression considered a linear model?](./faq/logistic_regression_linear.md)
 - [What is the probabilistic interpretation of regularized logistic regression?](./faq/probablistic-logistic-regression.md)
 - [Does regularization in logistic regression always results in better fit and better generalization?](./faq/regularized-logistic-regression-performance.md)
 - [What is the major difference between naive Bayes and logistic regression?](./faq/naive-bayes-vs-logistic-regression.md)

diff --git a/faq/README.md b/faq/README.md
@@ -74,6 +74,7 @@ Sebastian
 
 ##### Logistic Regression
 
+- [Why is logistic regression considered a linear model?](./logistic_regression_linear.md)
 - [What is the probabilistic interpretation of regularized logistic regression?](./probablistic-logistic-regression.md)
 - [Does regularization in logistic regression always results in better fit and better generalization?](./regularized-logistic-regression-performance.md)
 - [What is the major difference between naive Bayes and logistic regression?](./naive-bayes-vs-logistic-regression.md)

diff --git a/faq/logistic_regression_linear.md b/faq/logistic_regression_linear.md
@@ -0,0 +1,43 @@
+# Why is logistic regression considered a linear model?
+
+The short answer is: Logistic regression is considered a linear model because the outcome **always** depends on the **sum** of the inputs. Or in other words, the output **cannot** depend on the product (or quotient etc.) of the input features!
+
+So, why is that? Let’s recapitulate the basics of logistic regression first, which hopefully makes things more clear. Logistic regression is an algorithm that learns a model for binary classification. A nice side-effect is that it gives us the *probability* that a sample belongs to class 1 (or vice versa: class 0). Our objective function is to minimize the so-called logistic function &Phi; (a certain kind of sigmoid function); it looks like this:
+
+![](./logistic_regression_linear/2.png)
+
+Now, if *&Phi;(z)* is larger than *0.5* (alternatively: if *z* is larger than *0*), we classify an input as class 1 (and class 0, otherwise). This logistic (activation) function doesn't look very linear at all, right!? So, let's dig a bit deeper and take a look at the equation we use to compute *z* -- the net input function!
+
+![](./logistic_regression_linear/1.png)
+
+The net input function is simply the dot product of our input features and the respective model coefficients **w**:
+
+![](./logistic_regression_linear/3.png)
+
+Here, x<sub>0</sub> refers to the weight of the bias unit which is always equal to 1 (a detail we don’t have to worry about here). I know, mathematical equations can be a bit "abstract" at times, so let's look at a concrete example.
+
+Let's assume we have a sample training point **x** consisting of 4 features (e.g., *sepal length*, *sepal width*, *petal length*, and *petal width* in the [*Iris dataset*](https://archive.ics.uci.edu/ml/datasets/Iris)):
+
+    x = [1, 2, 3, 4]
+
+Now, let's assume our weight vector looks like this:
+
+    w = [0.5, 0.5, 0.5, 0.5]
+
+Let's compute *z* now!
+
+    z = w<sup>T</sup>x = 1*0.5 + 2*0.5 + 3*0.5 + 4*0.5 = 5
+
+---
+
+Not that it is important, but we have a 99.3% chance that this sample belongs to class 1:
+*&Phi;(z) = 1 / (1 + e<sup>-5</sup> = 148.41)*
+
+---
+
+The key is that our model is ***additive***
+our outcome *z* depends on the additivity of values, e.g., :
+
+*z = w<sub>1</sub>x<sub>1</sub> + w<sub>2</sub>x<sub>2</sub>*
+
+There's no interaction between input values, nothing like x<sub>1</sub>*<sub>2</sub> or so, which would make our model non-linear!
diff --git a/faq/logistic_regression_linear/1.png b/faq/logistic_regression_linear/1.png
diff --git a/faq/logistic_regression_linear/2.png b/faq/logistic_regression_linear/2.png
diff --git a/faq/logistic_regression_linear/3.png b/faq/logistic_regression_linear/3.png