Intro
The approach we explore, called reinforcement learning, is much more focused on goal-directed learning from interaction than are other approaches to machine learning.
Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.
RL v/s Supervised
Reinforcement learning differs fundamentally from supervised learning. While supervised learning dominates current machine learning research, it relies on:
- Learning from labeled training examples
- External supervision providing situation-label pairs
- Direct mapping between situations and their correct labels
RL v/s Unsupervised
Reinforcement learning differs from unsupervised learning in several key ways:
- Unsupervised learning focuses on finding hidden patterns in unlabeled data
- While supervised and unsupervised learning may seem to cover all machine learning approaches, they don't
- Though reinforcement learning doesn't use labeled examples like supervised learning, it's distinct because:
- It aims to maximize a reward signal
- It's not focused on discovering hidden data structures
Tradeoff
One of the key challenges unique to reinforcement learning is the exploration-exploitation trade-off:
- Exploitation
- Agent must use actions proven effective in the past
- Focuses on maximizing immediate rewards
- Exploration
- Agent needs to try new, untested actions
- Required to discover potentially better strategies
This creates a fundamental dilemma: exclusive focus on either strategy leads to failure. The agent must balance both by:
- Testing various actions
- Gradually favoring the most effective ones
- Repeatedly testing actions in stochastic environments
Despite decades of mathematical research, this exploration-exploitation dilemma remains an open challenge in the field.
Agent type approach