In this article, I’ll show you how to program gradient descent in Python using ONLY MATH. In this way, I will help you understand how this algorithm works deep down.
Let’s get started.
Table of Contents
Disclaimer
I have already written an article that discusses gradient descent thoroughly, explaining the mathematical concepts and steps of the algorithm with pictures and examples.
I suggest you read it before continuing.
Quick summary
Gradient descent optimizes the function parameter by finding the local value at which y is lowest.
It starts with a random value of x, and at each iteration, it calculates the current derivative. The derivative is a value that expresses the rate of change of a function, that is, how y varies as x increases.
If we add to x its derivative, y always increases. So we go in the opposite direction.
To update x it subtract from x the derivative multiplied by a learning rate, which adjusts the size of our steps.
It continues until the change in x is minimal or it reaches a maximum number of iterations.
Gradient descent in Python
Problem statement
We have a function f(x) = x² and we want to find the value of x for which y is as low as possible.
We can use the gradient descent algorithm to solve our task.
1. Import necessary libraries
#deal with arrays
import numpy
#deal with random values
from numpy import random
#plot graphs
from matplotlib import pyplot
2. Define the function and the derivative
#return the y value of an input x
def y_function(x):
return x ** 2
#return the derivative at an input point x
def y_derivative(x):
return 2 * x
If you don’t understand what a derivative is and why it is 2x, check the article about gradient descent or a more detailed font on calculus and derivatives.
3. Define the function list
#list of numbers between -100 and 100
x = numpy.arange(-100, 100)
#list of y values for each input in x
y = y_function(x)
4. Start with an initial random value
x_now = random.randint(-100, 100)
5. Choose hyperparameters
#the size of the steps
learning_rate = 0.1
#when the algorithm stops iterating
max_steps = 1000
6. Update the parameter and display the graph
for i in range(1, max_steps):
#update the parameter going towards the local minimum
x_now -= learning_rate * y_derivative(x_now)
#show the function's graph and x
pyplot.plot(x,y, x_now, y_function(x_now), "o")
pyplot.show()
Gradient descent full code
#deal with arrays
import numpy
#deal with random values
from numpy import random
#plot graphs
from matplotlib import pyplot
#return the y value of an input x
def y_function(x):
return x ** 2
#return the derivative at an input point x
def y_derivative(x):
return 2 * x
#list of numbers between -100 and 100
x = numpy.arange(-100, 100)
#list of y values for each input in x
y = y_function(x)
x_now = random.randint(-100, 100)
#the size of the steps
learning_rate = 0.1
#when the algorithm stops iterating
max_steps = 1000
for i in range(1, max_steps):
#update the parameter going towards the local minimum
x_now -= learning_rate * y_derivative(x_now)
#show the function's graph and x
pyplot.plot(x,y, x_now, y_function(x_now), "o")
pyplot.show()