Supervised vs. Unsupervised
Thefield of machine learning has two major branches—supervised learningand unsupervised learning—and plenty of sub-branches that bridge the two.
In supervised learning, the AI agent has access to labels, which it can use to improve its performance on some task. In the email spam filter problem, we have a dataset of emails with all the text within each and every email. We also know which of these emails are spam or not (the so-called labels). These labels are very valuable in helping the supervised learning AI separate the spam emails from the rest.
In unsupervised learning, labels are not available. Therefore, the task of the AI agent is not well-defined, and performance cannot be so clearly measured. Consider the email spam filter problem—this time without labels. Now, the AI agent will attempt to understand the underlying structure of emails, separating the database of emails into different groups such that emails within a group are similar to each other but different from emails in other groups.
This unsupervised learning problem is less clearly defined than the supervised learning problem and harder for the AI agent to solve. But, if handled well, the solution is more powerful.
Here’s why: the unsupervised learning AI may find several groups that it later tags as being “spam”—but the AI may also find groups that it later tags as being “important” or categorize as “family,” “professional,” “news,” “shopping,” etc. In other words, because the problem does not have a strictly defined task, the AI agent may find interesting patterns above and beyond what we initially were looking for.