Here’s how a machine learns, clearly, for everyone

I began to hear about artificial intelligence with the boom of ChatGPT and other programs to generate images, text and audio.

At that time I already had experience in programming, and I understood how a computer was only a simple calculator, interpreting our instructions into numbers and executing them.

I could not understand how machines so closely related to mathematics and numbers could do magnificent things like write, draw, and talk like a human being.

After I began to study machine learning, which studies the development of such programs, I realized that “artificial intelligence does not exist”.
There are no conscientious machines acting on their own, which through mathematical processes can extract patterns in the large data they are trained on.

This process is called training, and we will now look in detail at how it works.

Do I need to be an expert to read this article?

Absolutely not! In the explanations I provide, I make sure to never just blurt out the mathematical formula, but to include the logical reasoning behind it to enhance learning for everyone, expert and non-expert.

Still, the level of math used in this article is low and understandable by all.

The intent of this article

In this article, I want to give you an idea of how artificial intelligence learning can work, by studying the linear regression algorithm. I have not even fully covered this algorithm (I excluded gradient descent) to make it easier for everyone to understand.

It is obvious that not all models behave the same way and are designed to solve the same problem.

This article’s ultimate goal is to make you understand that there is no consciousness behind artificial intelligence, only mathematical algorithms.

What do I talk about in this article?

In this article, I focus on the training phase of a model, which is where “the magic,” the real learning, happens.
There are several steps for building complete machine learning models, all of which are covered here.

How a machine learns: the mathematical training of a model

Suppose we have a dataset that contains two columns, one with the area in m2 of some houses and the other with their prices.

We can plot the data on a graph, which would look something like this:

We are investors and want to predict the price of a house based on its area. This means that the column with the area is the input column X. The price is the output column y.

Knowing that the price of houses depends on their areas, we can make some considerations by eye, such as that the relationship of X and y is linear: as the area increases, so does the price, and vice versa.

With an area of 20 square meters we can guess that the price will be between
$35,000 and $100,000.

The method we have been using is impractical; we need accurate predictions.
So we decide to rely on an artificial intelligence called linear regression, which is useful for problems where the relationship between the data is linear.

Linear regression goal

The algorithm aims to find a line that best represents the relationship between X and y. Extending a point from the x-axis makes it possible to determine its value on the y-axis, which is the height where it intersects with the line.

But for the predictions to be accurate the line has to pass through the points, because otherwise, as in the example below, the difference between the y and the predicted y is $30,000.

This means that the error of our calculations is equal to the distance between the point and the line.

The task of the algorithm is then to move the line so that it is as close as possible to all the data points by adjusting two parameters:

  • The slope, the tangent of the angle of the line with the x-axis.
  • The intercept, the point where the line intersects with the y-axis.

So the mathematical formula for y is:

y = intercept + slope * X

The formula above in this image is displayed and clearly explained

Linear regression algorithm

We start with a straight line (slope = 0) that passes through (the intercept equals to) the mean value of the y-axis.

For each point, the algorithm measures the distance from the line, and based on that it slightly modifies the two parameters to reduce the error.

This process is repeated many times until the line is as near as possible to all the data points.

Share the knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *