Chapter 1 | Notion

Intro

The approach we explore, called reinforcement learning, is much more focused on goal-directed learning from interaction than are other approaches to machine learning.

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.

RL v/s Supervised

Reinforcement learning differs fundamentally from supervised learning. While supervised learning dominates current machine learning research, it relies on:

Learning from labeled training examples
External supervision providing situation-label pairs
Direct mapping between situations and their correct labels

RL v/s Unsupervised

Reinforcement learning differs from unsupervised learning in several key ways:

Unsupervised learning focuses on finding hidden patterns in unlabeled data
While supervised and unsupervised learning may seem to cover all machine learning approaches, they don't
Though reinforcement learning doesn't use labeled examples like supervised learning, it's distinct because:
- It aims to maximize a reward signal
- It's not focused on discovering hidden data structures

Tradeoff

One of the key challenges unique to reinforcement learning is the exploration-exploitation trade-off:

Exploitation
- Agent must use actions proven effective in the past
- Focuses on maximizing immediate rewards
Exploration
- Agent needs to try new, untested actions
- Required to discover potentially better strategies

This creates a fundamental dilemma: exclusive focus on either strategy leads to failure. The agent must balance both by:

Testing various actions
Gradually favoring the most effective ones
Repeatedly testing actions in stochastic environments

Despite decades of mathematical research, this exploration-exploitation dilemma remains an open challenge in the field.

Intro

RL v/s Supervised

RL v/s Unsupervised

Tradeoff

Agent type approach