What is Supervised Learning?

Supervised learning is one of the most common types of machine learning used today. When people talk about machine learning, there’s a huge chance they’re talking about supervised learning.

Typically in supervised learning, algorithms learn from past examples to predict new ones. Examples of these include linear regression, decision trees, and neural networks.

These machine learning algorithms predict by first,

  • Learning from labelled training data, where the data includes various input variables, \(x\) and one or sometimes few output variables, \(y\).
  • Then, the algorithm will find patterns in the data and map them to the output (training starts).
  • Once training is completed, a model is formed and we can provide it with new unseen unlabelled data.
  • The model will then predict the output variable/label.

In its most basic mathematical form, a supervised learning algorithm can be written as:

\[y = f(x)\]

Where \(y\) is the output variable/label, \(x\) are the input variables, and \(f()\) is your mapping function (algorithm) from the input variables, \(x\) to output variables, \(y\)

Supervised learning problems can also be grouped into two different problems:

  • Classification. A problem where we want to predict a discrete label/category.
    (e.g. Emails, Spam or Not Spam; People, Weeb or Gachi; Pokemon Type, Grass, Water, or Fire)
  • Regression. A problem where we want to predict a continuous value.
    (e.g. Stock price, Dogecoin price, the weight of a fish)


In classification problems, supervised machine learning models predict the label/category based on the training data previously provided. These machine learning models are also sometimes known as classifiers.

For example, let’s say we have a training data consisting of various Pokemon. The data includes their names, three weakness multipliers (fire, water, grass), and their types.

Pokemon dataset
Pokemon dataset

If we wanted to find out future Pokemon types based on their attributes, we can consider against_fire, against_water, against_grass (weaknesses) as the input variables, and type1 as the output variable we want to predict.

We can use a supervised machine learning algorithm like Decision Tree to classify Pokemon into various types based on their weaknesses. An example of how a Decision Tree model classifying Pokemon types can be found below:

Pokemon Type Decision Tree
Pokemon Type Decision Tree
data_m = data_m.query('type1 == "fire" | type1 == "water" | type1 == "grass"')
data_m = data_m[['against_fire', 'against_water', 'against_grass', 'type1']]
poke_x = data_m.loc[:, data_m.columns != 'type1']
poke_y = data_m['type1']

train_x, test_x, train_y, test_y = train_test_split(poke_x, poke_y, test_size=0.4, random_state=69)

model = tree.DecisionTreeClassifier(criterion='gini')
model.fit(train_x, train_y)

dot_data = tree.export_graphviz(model, out_file=None, 
                      filled=True, rounded=True,  
graph = graphviz.Source(dot_data)  
graph.format = 'png'

As we can see in the graph above, the Decision Tree algorithm learns from our training data and creates a decision tree to help classify Pokemon’s types.

So, if we had a Pokemon that is weak against fire (against_fire > 1.25), it will be classified as a grass type Pokemon based on our Decision Tree algorithm.

While it’s fun and games to create a simple model and start predicting stuff, it’s good to know how well our model is doing as well. In order to do so, we can calculate the accuracy of a classifier by calculating the percentage of correctly classified observations out of all the predictions made.

Mathematically, that can be written as:

\[accuracy = \frac{correct\ predictions}{total \ predictions} \times 100\]


For regression problems, models predict a continuous value such as the stock price, fish weight, and avocado prices.

Fish dataset
Fish dataset

Let’s say we have a fish dataset (which you might have seen before from Linear Regression basics for Machine Learning) and you want to predict the weight based on its width.

In this case, Width is our input variable and Weight is our output variable.

If we use a linear regression to model over the data, we would end up with a line like below:

Simple Linear Regression – Width (x) vs Weight (y)
Simple Linear Regression – Width (x) vs Weight (y)

Our algorithm has created a line of best fit to predict the Weight (y) based on the Width (x).

Because a regression predictive model predicts a continuous value, calculating the accuracy of our model is not as easy as a classification problem (where it’s just being correct classification or not).

One of the many ways to calculate the accuracy is Root Mean Square Error, or RMSE for short, which is frequently used in calculating the difference between values predicted by a model.

Mathematically, it can be described as follows:

\[RMSErrors = \sqrt{\frac{\sum_{i=1}^n(\hat{y_i}-y_i)^2}{n}}\]


All in all, supervised learning is one of the most commonly used types of machine learning. We know it can tackle classification and regression problems, and for each of these problems, there are different ways to calculate the accuracy of the models.

Next, you might want to know the difference between generative and discriminative classifiers.