Deep Dive into LLMs like ChatGPT
Step 1 — Pretraining Data (Internet)
- Training on large-scale text data produces a base model.
Step 2 — Tokenization
- Raw text is converted into symbols — we call them tokens.
- tiktokenizer can be used to test how many tokens a given piece of text would be divided into for a specific model.
Step 3 — Neural Network Training
- Inference: generating new data from the model.
- What is the relationship between model parameters and tokens?
- Regurgitation: excessive memorization can cause token predictions to deviate from the original material — how can this be mitigated?
- Pretraining data for large models usually has a cutoff date, so it doesn’t include the most recent information.
- When you only have a base model and want to create an LLM assistant, you can give the model a defined identity and example responses as part of the prompt.
Step 4 — Post-Training
- Turn the LLM into an assistant.
- Post-training data: conversation datasets.
- Key focus areas: hallucinations, tool use, knowledge/working memory.
How to Reduce AI Hallucinations:
Question Reconstruction Method: Convert existing articles into question–answer pairs. Compare the model’s answers with correct ones. If the model answers incorrectly or shows high uncertainty, add an example in the training data such as “Sorry, I don’t know” for that question.
Allow the Model to Perform Web Searches: Add “use the web search to make sure” at the end of your prompt — this encourages the model to include verified information sources in its response.
Provide Context Directly: Compared to relying on the model’s memory, directly embedding contextual information into the prompt greatly improves accuracy and quality.
“Rereading always leads to a better summary” — this is the underlying philosophy at work.
Let the Model Use Code for Computation: When your prompt involves arithmetic or precise numerical results, asking AI to “mentally calculate” often leads to errors. Instead, let it use tools it excels at — add “use code” to the prompt. For example, since models are bad at counting letters, adding “use code” allows it to solve the problem via Python or other scripting logic. Models can’t see characters, only tokens, and are poor at counting. For instance, in the word raspberry, how many “r”s are there? Many models only count 2, while the correct answer is 3. Some models even store such answers via hard-coded data. → Models need tokens to think.
Step 5 — Reinforcement Learning
- After post-training, a Supervised Fine-Tuning Model (SFT) is produced. Reinforcement learning is then applied to this model.
- After this stage, we obtain a Reinforcement Learning Model.
- DeepSeek-R1: In its research paper, the team showed that reinforcement learning improves response accuracy by encouraging models to “think longer.” Multi-angle and multi-step reasoning enhances output quality.
- The so-called reasoning models are essentially reinforcement learning models.
- If you are concerned about data security, you can use AI hosting platforms like together.ai to safely test and compare various large models.
- AlphaGo Analogy:
- Supervised learning is like imitating the strongest player — performance improves steadily but eventually plateaus.
- Reinforcement learning, on the other hand, can keep improving indefinitely through self-play.
Reinforcement Learning from Human Feedback (RLHF)
- What’s the difference between RL and RLHF? Are their reward mechanisms the same?
- RLHF is better suited for tasks without a single correct answer, such as judging whether a joke is funny — i.e., creative behavior beyond math or code.
- Base models can be thought of as hyperbolic models — there are many foundational variations.
- To run locally, you can use LM Studio.