BLOG
All
The derivative of a function explained clearly
Suppose we have a graph representing the population of a village as a function of time.Let us take two time instants on the x-axis where the population is equal. Now…
Code random forest from scratch in Python
In this post, I’ll show you how to program a random forest from scratch in Python using ONLY MATH. Why is coding a random forest from scratch useful? When studying…
The complete guide to handling missing values
What are missing values in machine learning? Missing values in a dataset indicate the absence of observations. The danger of missing values Why are missing values a problem for our…
The complete guide to encoding categorical features
What are categorical features – recap In categorical features, measurements can assimilate a number of limited and fixed values, called “categories“. There are 2 types of categorical features: Why can’t…
What is feature engineering? Definition, techniques and importance
What is feature engineering? Feature engineering is selecting, extracting, and transforming features from raw data to create a new dataset useful for building predictive models. This new dataset is compatible…
What is a confusion matrix?
Types of classification outputs Positive and negative outputs In a classification problem, there are 2 types of categories, positive and negative. Positive categories are labels with a particular characteristic that…
Machine learning fundamental validation metrics
What is a validation metric? A validation metric is a formula we use during model validation to confront the model predictions with the actual output values. The choice of a…
Model validation: definition, examples and Python implementation
What is model validation? Model validation is the step that comes after training. During model validation, we evaluate the accuracy of our model by seeing how it performs with data…
Data in machine learning: collection, types and structure
In machine learning, we can identify data as a set of observations or measurements, called dataset, used to train and test a machine learning model. Data are crucial because artificial…
Bias and variance of a model
Bias and variance are 2 fundamental metrics to describe a model’s ability to resolve a problem. Let’s say we have a dataset like this one. We want to represent the…