Gradient Descent for Logistic Regression
Contents
Gradient Descent for Logistic Regression#
The gradient descent for classification follows the same procedure as described in GD_algo_multi
in Section Gradient Descent in Multilinear Regression with the definition of the cost function from Equation (19) above.
Derivatives in the linear case#
Consider the linear assumption \(\boldsymbol{x} \cdot \boldsymbol{\theta} = \theta_0 + \theta_1 x_1 + \cdots + \theta_n x_n\) as input to the sigmoid function \(\sigma\). The cost function derivatives will take the form:
Wait: we’ve seen this somewhere! This takes the same form as the derivatives for linear regression, see Equation (15) in Section Gradient Descent in Multilinear Regression.
Gradient descent: pseudo-code for logistic regression#
This is left as an exercise 😉 to check your understanding and practice writing pseudo-code.
Exercise
Based on the algorithm GD_algo_multi
in Section Gradient Descent in Multilinear Regression, adapt it for logistic regression and write the corresponding pseudo-code, highlighting the differences and including the elements specific to classification.
You will put this into practice in the second tutorial, where you’ll code a classifier by hand!
Alternative techniques#
Beside logistic regression, other algorithms are designed for binary classification.
The Perceptron, which is a single layer neural network with, in its original form, a step function instead of a sigmoid function. This will be introduced in the lecture on neural networks.
Support Vector Machines (SVMs) are robust and widely used in classification problems. We will not cover them here but below are some links for further reading.
Learn more
R. Berwick, An Idiot’s guide to Support vector machines (SVMs) on web.mit.edu
Support Vector Machines: A Simple Explanation, on KDNuggets
Support Vector Machines: Main Ideas” by Josh Starmer, on StatQuest YouTube channel
Numerous methods have been developed to find optimized model parameters more efficiently than gradient descent. These optimizers are beyond the scope of this course and usually available as libraries within python (or other languages). Below is a list of the most popular ones:
Learn more
The BFGS algorithm: Wikipedia article
The Limited-memory BFGS, L-BFGS: Wikipedia article
Conjugate gradient method: Wikipedia article