Gradient descent in Python from scratch

In this article, I’ll show you how to program gradient descent in Python using ONLY MATH. In this way, I will help you understand how this algorithm works deep down.

Let’s get started.

Disclaimer

I have already written an article that discusses gradient descent thoroughly, explaining the mathematical concepts and steps of the algorithm with pictures and examples.

I suggest you read it before continuing.

Quick summary

Gradient descent optimizes the function parameter by finding the local value at which y is lowest.

It starts with a random value of x, and at each iteration, it calculates the current derivative. The derivative is a value that expresses the rate of change of a function, that is, how y varies as x increases.

If we add to x its derivative, y always increases. So we go in the opposite direction.

To update x it subtract from x the derivative multiplied by a learning rate, which adjusts the size of our steps.

It continues until the change in x is minimal or it reaches a maximum number of iterations.

Gradient descent in Python

Problem statement

We have a function f(x) = x² and we want to find the value of x for which y is as low as possible.

We can use the gradient descent algorithm to solve our task.

1. Import necessary libraries

#deal with arrays
import numpy

#deal with random values
from numpy import random

#plot graphs
from matplotlib import pyplot

2. Define the function and the derivative

#return the y value of an input x
def y_function(x):
    return x ** 2

#return the derivative at an input point x
def y_derivative(x):
    return 2 * x

If you don’t understand what a derivative is and why it is 2x, check the article about gradient descent or a more detailed font on calculus and derivatives.

3. Define the function list

#list of numbers between -100 and 100
x = numpy.arange(-100, 100)

#list of y values for each input in x
y = y_function(x)

4. Start with an initial random value

x_now = random.randint(-100, 100)

5. Choose hyperparameters

#the size of the steps
learning_rate = 0.1

#when the algorithm stops iterating
max_steps = 1000

6. Update the parameter and display the graph

for i in range(1, max_steps):
    #update the parameter going towards the local minimum
    x_now -= learning_rate * y_derivative(x_now)

    #show the function's graph and x
    pyplot.plot(x,y, x_now, y_function(x_now), "o")
    pyplot.show()

Gradient descent full code

#deal with arrays
import numpy

#deal with random values
from numpy import random

#plot graphs
from matplotlib import pyplot


#return the y value of an input x
def y_function(x):
    return x ** 2

#return the derivative at an input point x
def y_derivative(x):
    return 2 * x


#list of numbers between -100 and 100
x = numpy.arange(-100, 100)

#list of y values for each input in x
y = y_function(x)

x_now = random.randint(-100, 100)

#the size of the steps
learning_rate = 0.1

#when the algorithm stops iterating
max_steps = 1000


for i in range(1, max_steps):
    #update the parameter going towards the local minimum
    x_now -= learning_rate * y_derivative(x_now)

    #show the function's graph and x
    pyplot.plot(x,y, x_now, y_function(x_now), "o")
    pyplot.show()

Share the knowledge