made by https://cneuralnets.netlify.app/

We’re gonna discuss how the Deepseek R1 model actually works in detail but with very less math!

The blog will have 3 main parts —

Chain of Thought Reasoning
Reinforcement Learning
GRPO
Distillation

Original Paper Link - https://arxiv.org/abs/2501.12948

Chain of Thought Reasoning

This is basically a prompt engineering thing that we apply to the model. We force the model to think rather than just giving us the answer. We add a simple prompt to the user prompt.

Let’s say the user inputs what is the solution of 2+2?, from our side we add a prompt that will be something like please think step by step and explain everything step by step . Let’s now discuss how actually the prompt looked for Deepseek R1 (approximate).

<aside> 👉

<think>

Understand the Question: Begin by carefully reading and interpreting the question to clarify what is being asked.
Identify Key Components: Break down the question into its essential elements. What are the main concepts or variables involved?
Outline Relevant Information: Consider any formulas, definitions, or prior knowledge that may apply to this problem. What information do I need to solve it?
Step-by-Step Reasoning:
- Clearly articulate each step of your reasoning process.
- Apply logical reasoning to derive conclusions from the information provided.
- If applicable, perform necessary calculations or analyses in a systematic manner.
Summarize Key Points: After completing your reasoning, summarize the main points that are relevant to the question.
Final Answer: Provide a concise answer that directly addresses the question. </think> <answer>{{final_answer}}</answer> </aside>

This would always give an answer in the following format

<think>{{thoughts}}</think>
<answer>{{final_answer}}</answer>

Let’s see an example run on the question What is the sum of all even numbers from 1 to 100?

<aside> 👉

<think> To find the sum of all even numbers from 1 to 100, I first need to identify the even numbers in this range. The even numbers between 1 and 100 are: 2, 4, 6, ..., 100.

These numbers can be expressed as a sequence where each number can be represented as \(2n\), where \(n\) is an integer. The smallest even number is \(2(1) = 2\) and the largest is \(2(50) = 100\).

Now, I can see that there are 50 even numbers in total (from \(n=1\) to \(n=50\)).

To calculate the sum of these even numbers, I can use the formula for the sum of an arithmetic series: \[ S_n = \frac{n}{2} \times (a + l) \] where:

\(S_n\) is the sum,
\(n\) is the number of terms,
\(a\) is the first term,
\(l\) is the last term.

Plugging in the values:

\(n = 50\),
\(a = 2\),
\(l = 100\).

Now substituting these values into the formula: \[ S_{50} = \frac{50}{2} \times (2 + 100) = 25 \times 102 = 2550. \] Therefore, the sum of all even numbers from 1 to 100 is 2550. </think> <answer>2550</answer>

</aside>

As you can see in the paper itself, the model pauses, and then waits and then, continues to give it a second thought.

Keep in mind CoT is not system prompting! In CoT, it involves appending specific instructions to a query that prompt the model to explain its reasoning step by step (e.g., "Explain your answer step by step”), whereas in system prompting, it involves setting broader parameters for interaction, such as defining the role of the assistant (e.g., "You are a helpful assistant")