A definition I like most is, “the semi-automated extraction of knowledge from data”. I say semi-automated because machine learning (ML from now on) requires both humans and computers to work properly. ML starts with a question that might be answerable with data. That data is fed to a ML algorithm to build a predictive model, which can then be used to generate insight.
Supervised learning seeks to predict a specific outcome, for example, is this email spam or not?. The first step in supervised learning is training a ML model using labeled data. For example, an ML algorithm might be fed thousands of emails (inputs) and be told whether each email is spam or not (output). The algorithm builds a predictive model that learns the relationship between the attributes of the data and its outcome, perhaps that emails with lots of links in the body and uppercased words in the subject line are likely to be spam.
Then, use this predictive model to make predictions on new data for which the label is unknown. For example, is this new email I’ve never seen spam or not? The primary goal in supervised learning is to build a predictive model that “generalizes”, and accurately predicts the future rather than the past.
Unsupervised learning works with far less data than supervised learning because the data is not labeled. Unsupervised learning aims to extract structure from unlabelled data in order to learn how to best represent it. In contrast to supervised learning, there is no “right answer”. For example, if I have a data set representing the behaviors of e-commerce shoppers, an unsupervised learning task might be to group shoppers into clusters that exhibit similar behavior. Notice there are no labels in this model, and the predictive model produces clustered data instead of correctly labeled data.
Using the e-commerce shopper example from before, an unsupervised learning model would group shoppers into clusters with similar behavior, say, urban males 18-35, suburban females 45-60 and single mothers. The shoppers in these clusters have similar behaviors and dissimilar behaviors from the other clusters.
In contrast to supervised learning (input, correct output) and unsupervised learning (input, no output), reinforcement learning gives an input, some output, and a grade for that output. This is interesting because it closely mimics the way humans naturally learn.
For example, think of a toddler looking at a hot cup of coffee. The toddler is curious and touches the cup (input) and receives some output (the feeling of touching the hot cup) with a grade (ouch!). The toddler just received a negative reward (pain) for undesirable behavior (touching a cup of coffee with steam coming out of it). The toddler may experience this model of reinforcement learning several times before learning to identify and not touch hot cups of coffee.
Reinforcement learning is often used to help computers learn how to play games, using a current state and target function. If you want to teach a computer how to play chess, the computer will need to know the current state of the board, and identify the move with the greatest chance to win the game given the current state, called the target function. Since the computer doesn’t know anything about playing chess, it will start by randomly selecting each move, and playing the chess game to completion. When the game ends and the computer wins or loses, that grade (positive for a win, negative for a loss) will be propagated back to each move in that game. When the computer plays another game it will then have some information to feed to its target function that decides what move to choose. Over time, the computer will get better and better at playing chess, all based on the rewards and punishments allocated to specific moves given a win or a loss.
Visualized, the reinforcement learning model looks something like this: