Yuan's Blog
EN

How to Use Different Prompt Techniques to Make LLMs More Reliable?

This article covers not just prompt writing tips, but the evolution from prompt design techniques to building workflows with Prompts, to complete LLM application and system design patterns using Prompts as interfaces.

I. Prompt Expression Layer (Prompting as Text)

  1. Zero-Shot Prompting
  • Definition: Without providing any examples, use a single clear instruction to have the model complete a task directly. Examples include: classification, translation, summarization, simple Q&A, etc.
Classify the text into neutral, negative or positive.

Text: I think the vacation is okay.

Sentiment:
  1. Few-Shot Prompting
  • Provide the model with a few examples (demonstrations/exemplars) in the Prompt before having it complete a new task. This is essentially In-Context Learning. Use this when you have requirements for response tone/style/format and need to align with your specific rules for this instance.
Text: That movie was great!
Sentiment: Positive

Text: The food was terrible.
Sentiment: Negative

Text: I think the vacation is okay.
Sentiment:
  1. Chain-of-Thought Prompting
  • Chain-of-thought prompting makes the “intermediate reasoning process” explicit, forcing the model to decompose problems step by step and calculate progressively, thus avoiding jumping to answers. This is useful for math problems, calculations, etc.
Solve the problem step by step, then give the final answer:
If you have 3 apples and you buy 2 more, how many apples do you have?
  1. Self-Consistency

What is Self-Consistency? Self-Consistency is a prompting technique for reasoning tasks, designed to replace the instability of relying on a single path, single generation in Chain-of-Thought. Its core approach is: generate multiple independent reasoning paths for the same problem, then select the most frequently occurring, mutually consistent final answer as output. It’s not about making the reasoning process more complex, but about improving the stability and reliability of reasoning results in engineering practice through “multiple independent thinking + consistency-based selection.”

Intuitive Understanding Think of Self-Consistency as: having multiple people solve the same problem separately, then not looking at who wrote the most elegantly, but rather which answer was reached by the most people. Here, “multiple people” corresponds to different reasoning paths generated by the model, and “voting” corresponds to consistency filtering for the final answer.

Why Use It? When a single Prompt asks the model to both “complete the reasoning process” and “give a final conclusion” in one generation, it’s easy for the whole thing to go off track due to a misjudgment in one intermediate step. Self-Consistency explicitly exposes this randomness by introducing multiple reasoning paths and constrains it with consistency rules, reducing the impact of single reasoning failures on the final result. It’s especially suitable for arithmetic reasoning, common sense inference, and explanation tasks that require result stability.

Incorrect One-Line Approach (Example)

When I was 6 my sister was half my age. Now I'm 70, how old is my sister?

This approach compresses “reasoning” and “conclusion” into a single generation. Once the model chooses the wrong reasoning path, it will self-consistently output an incorrect answer.

The minimal Self-Consistency approach isn’t complicated and can be broken into three steps:

  • Step 1: Have the model complete reasoning using chain-of-thought
  • Step 2: Generate multiple independent responses for the same problem
  • Step 3: Statistically analyze final answers and select the most frequently occurring result

Minimal Universal Template (Reusable)

You will perform multiple independent reasoning attempts for the same problem.
Each attempt must provide a complete thought process and give a clear answer at the end.
After completing all reasoning, output only the final answer that appears most frequently.

Problem:
{Insert your problem here}
  1. Meta Prompting

Meta Prompting is a prompting method of “defining structure first, then filling content.” Instead of teaching the model how to answer through examples, you first tell the model what structure the answer should follow, then have the model complete specific content within that structure.

Intuitive Understanding: You’re not giving the model a “reference answer,” but rather a response template. The template only specifies order and sections, not what to write specifically.

  • Template → Structural rules in the Prompt
  • Filling content → Model’s reasoning and generation

In engineering practice, Meta Prompting is commonly used in scenarios requiring unified output structure. The model no longer imitates a specific example but consistently thinks and expresses according to the same set of structural rules.

A more intuitive way to judge: If your Prompt simultaneously asks the model to learn from examples, actually solve the problem, and explain its thinking, the output will likely become chaotic. Meta Prompting’s approach is to specify “how to think, how to write” in advance, letting the model focus only on filling content rather than making trade-offs during generation.

Incorrect One-Line Approach Example:

Please refer to the example below, solve this problem, and explain your thought process.

This approach mixes example imitation, problem solving, and process explanation together, causing output structure to easily become chaotic as examples change.

Minimal Improvement Using Meta Prompting:

Please complete the task following this structure:

  • Step 1: List known conditions
  • Step 2: Provide reasoning steps
  • Step 3: Reach final conclusion

Requirement: Each step should only complete its corresponding content.

No specific examples are provided here—only defining how the answer should be organized.

  1. Directional Stimulus Prompting

Directional Stimulus Prompting is a prompting method of “highlighting key points first, then generating.” Instead of letting the model judge what’s important, you explicitly tell the model where the key points you want to focus on are, then have the model complete answers around these points, reducing omissions and going off-track.

Intuitive Understanding: You’re not asking the model to “summarize freely,” but first giving it a key points checklist. This checklist only specifies what points must be covered, not how to write them specifically.

  • Key points checklist → stimulus/hint in the Prompt
  • Output around key points → Model’s generation and expression organization

In engineering practice, Directional Stimulus Prompting is commonly used in scenarios requiring stable coverage of critical information, such as summaries, information extraction, report generation, etc. The model’s goal is no longer “how similar the writing looks” but rather “not missing a single point that should be mentioned.”

A more intuitive way to judge:

  • If the model’s response looks fine but always misses the things you care most about, it means key points weren’t constrained in advance—suitable for directional stimulus prompting.

Incorrect One-Line Approach Example:

Please summarize the following content in 2-3 sentences.

This approach mixes “judging key points” and “generating summary” together. Key points are left entirely to the model’s discretion, easily resulting in missing critical information or going off-track.

Using Directional Stimulus Prompting for minimal improvement:

Summarize content referring to these key points:

Key Points:
- {Point that must be mentioned 1}
- {Point that must be mentioned 2}
- {Point that must be mentioned 3}
  1. Generated Knowledge Prompting

Generated Knowledge Prompting is a prompting method: first have the model explicitly generate background knowledge relevant to the question, then complete the final judgment or answer based on this knowledge.

It’s like having someone first write relevant information on scratch paper before formally answering. The scratch paper content isn’t the answer—it’s the background information needed to solve the problem.

  • Scratch paper → Generated Knowledge
  • Formal answer → Final response based on Knowledge

Empirical Judgment Rules: When a question appears simple on the surface but actually depends on implicit common sense or real-world rules, errors are likely. In one-shot answers, models often optimize directly for conclusions rather than first ensuring the knowledge supporting conclusions is complete and correct. The shorter and more definite the output goal (e.g., Yes/No), the more likely the model skips necessary knowledge verification.

Three Typical Applicable Situations

Situation One: Common Sense Judgment Questions

Problem Characteristics: Judgment depends on real-world rules, but rules aren’t explicitly present in the question.

Incorrect approach:
Part of golf is trying to get a higher point total than others. Yes or No?

Using Generated Knowledge Prompting:
Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: {generated golf scoring rules}
Answer:

Situation Two: Questions Where Facts and Definitions Are Easily Confused

Problem Characteristics: Similar concepts, model easily handles definition boundaries vaguely.

Incorrect approach:
A rock is the same size as a pebble. Yes or No?

Using Generated Knowledge Prompting:
Question: A rock is the same size as a pebble. Yes or No?
Knowledge: {generated pebble size definition}
Answer:

Situation Three: Binary Judgments Like Yes/No, True/False

Problem Characteristics: Extremely small output space, model easily gives conclusions directly without verifying basis.

Incorrect approach:
Smoking increases the chance of lung cancer. Yes or No?

Using Generated Knowledge Prompting:
Question: Smoking increases the chance of lung cancer. Yes or No?
Knowledge: {generated causal or statistical facts}
Answer:

Question Template

Input: {question}

Step 1: First write out the background knowledge or facts necessary to answer this question, don't give conclusions.
Knowledge:

Step 2: Answer the question based only on the above knowledge.
Answer:
  1. Multimodal CoT Prompting

Multimodal CoT Prompting is when images and text are input together, first having the model write out the “understanding process,” then having it give the answer, avoiding blind guessing without looking carefully.

The problem it solves is simple: when the model looks at images + answers, it often uses text experience to directly draw conclusions.

It’s like when a teacher gives you a picture-based question:

  • Image and question → Input
  • First write “how I figured this out” → Reasoning process
  • Then write the answer → Final conclusion

The key isn’t writing beautifully, but actually looking at the image before speaking

In actual systems, this approach equals forcing the model to proceed in order: first understand the image, then give results, reducing step-skipping and blind guessing.

Why Use It?

Once the model knows “the goal is to give an answer,” it might compose an answer first, then supplement reasoning afterward.

Three Most Common Situations

Situation One: Image Q&A

Incorrect situation:

Look at this image, what is this? Why?

The model might directly guess the answer using common sense, with reasoning also composed afterward—whether the image was looked at doesn’t matter.

Improved approach:

Please first describe the key content you see in the image,
explain what each content represents.
After completing the analysis, then answer what this is.

Force it to “describe what it sees” first, then allow conclusions.

Situation Two: Finding Commonalities / Making Comparisons

Incorrect situation:

What do these two things have in common?

Might directly give a commonality that “sounds right” without checking each item.

Improved approach:

Please separately list the attributes visible in the image for these two objects,
then based on these attributes, determine their commonalities.
Finally, give your conclusion.

Situation Three: Multiple Choice Based on Images

Incorrect situation:

Choose the correct answer based on the image.

Only looking at results, completely unclear whether the model seriously analyzed.

Improved approach:

Please first explain how you step-by-step eliminated wrong options based on the image,
then give your chosen final answer.

Let the process be exposed—only then does the answer have meaning.

  1. Graph Prompts
  • Organize information in the problem into graph structure (Graph) using nodes and edges to clearly represent entities and relationships, then use this structure as prompt input to large models. The goal is to improve model performance on relationship understanding, reasoning, and structured tasks.

II. Prompt Workflow Layer (Prompting as Workflow)

  1. Prompt Chaining

What is Prompt Chaining?

Prompt Chaining refers to breaking a complex task into multiple smaller questions, asking the model in a fixed sequence, and using the output from the previous step as input for the next step. Its purpose isn’t to make the process complex, but to separate and explicitly control judgment and thinking processes that are prone to errors, achieving risk isolation. This advantage may not be obvious in simple tasks, but in engineering scenarios with large contexts requiring explainability, auditability, and minimal hallucinations, it often significantly improves result stability and controllability.

Using an intuitive metaphor, it’s like making a complex bento box: rather than asking to “do everything at once,” you complete it step by step following “cook rice first, then prepare dishes, finally arrange.” Prompt Chaining brings this step-by-step thinking into interactions with large language models, having the model do only one thing at each step, avoiding multiple different types of thinking tasks in a single generation.

Why Use It

LLMs typically can only follow one line of thinking in a single generation. When you ask it to complete two different types of thinking tasks in one Prompt, the later output goal will contaminate the earlier thinking process. In other words, models often don’t think clearly before answering, but compose thinking processes while generating answers. Therefore, when the following things are placed in the same Prompt, errors are especially likely.

The first type is mixing judgment and response together. Judging whether something is feasible should first establish judgment criteria, but in the same Prompt, the model tends to give conclusions first, then piece together reasons for conclusions, causing unstable judgment basis. Incorrect example:

Is this Prompt Chaining solution reasonable? Please give the conclusion directly and explain the reasoning.

The second type is mixing fact retrieval and explanation together. When the model simultaneously finds reasons and explains them, if it doesn’t know the actual facts, it will naturally supplement content that sounds reasonable but isn’t verified.

Based on this document, explain why Prompt Chaining can significantly reduce hallucination problems.

The third type is mixing structure planning and content generation together. Structure planning requires a global perspective, while content generation is linear output; when both happen simultaneously, the model easily changes structure while writing, causing logical jumps or repetition.

Please write an article systematically, completely, and logically, introducing Prompt Chaining's definition, principles, advantages, and application examples.

How to Use?

Real Example: Wanting to Judge Whether a Solution is Feasible

If asking in one sentence, such as:

Is this solution feasible? Please explain the reasoning.

Disadvantage analysis: The main problem with this approach is that it mixes “establishing judgment criteria first” and “directly concluding and explaining”—two different types of tasks—together. In a single generation, the model tends to first give a seemingly reasonable conclusion, then supplement reasons for this conclusion; since judgment criteria weren’t explicitly established beforehand, the reasons are more like “after-the-fact explanations” serving the conclusion. The result: unstable judgment basis, potentially subjective explanations, and you can’t see clearly what factors it based its judgment on.

A more reliable approach is using minimal Prompt Chaining, splitting “determining judgment basis” and “evaluating based on criteria and concluding” into two steps.

Step 1: First Define Judgment Criteria

Please list the key factors to consider when judging whether a solution is feasible.
Only list factors, don't draw conclusions.

Step 2: Then Use Criteria to Give Conclusions

Based on the key factors listed above,
evaluate the following solution item by item and give a final conclusion.
The solution is: {solution content}

Simplest Prompt Chaining Template

This template does only one thing: separate “thinking” from “answering.”

Step 1: Think Clearly First, Don’t Answer

Before answering the question,
please first list the key points / judgment criteria you need to consider.
Only list points, don't give the final answer.

Step 2: Answer Based on Previous Step

Based on the key points listed above,
think step by step and give the final answer.
  1. Tree of Thoughts (ToT)

Tree of Thoughts (ToT) is a prompting and reasoning framework for complex reasoning tasks. The core problem it solves is: when problems have multiple possible paths requiring exploration, backtracking, or comparing intermediate solutions, a single linear generation easily goes off track and cannot self-correct. Its basic working method is: having the model perform branch generation, evaluation, and filtering among multiple “intermediate thoughts,” and gradually approach feasible solutions through search strategies.

Think of ToT as “multi-step preview” in chess. You don’t determine victory or defeat with just one move, but simultaneously envision several approaches, deduce several steps for each, then eliminate lines that will obviously lose.

Why Use It?

When a problem has multiple possible solutions and taking a wrong step midway is hard to correct, linear Prompts are often unstable—suitable for using ToT.

Three Typical Applicable Situations

First: Problems Requiring Exploration of Solution Space For example, mathematical reasoning, puzzles, planning tasks. These problems don’t have an obvious first step; ToT allows the model to try multiple paths simultaneously, then gradually eliminate impossible branches.

Second: Problems Where Intermediate Steps Can Be Judged Right or Wrong When some thoughts can be marked as “feasible / infeasible / uncertain,” ToT can prune early, preventing ineffective reasoning from spreading further.

Third: Problems Requiring Backtracking or Comparing Solutions For example, strategy formulation, complex decision analysis. ToT lets the model explicitly retain multiple candidate solutions rather than discovering the wrong direction late in generation when it’s too late to turn back.

How to Use?

Incorrect One-Line Approach:

Please think step by step and calculate how these four numbers can reach 24 through addition, subtraction, multiplication, and division.

This approach mixes path selection, reasoning expansion, and result confirmation in a single generation. Once the model chooses the wrong combination early, subsequent reasoning gets pulled along the wrong direction, making correction difficult.

Minimal Viable Improvement:

Step 1: Generate Multiple Candidate Approaches
Given numbers 3, 3, 8, 8.
Please list 5 different intermediate calculation approaches, only write the first step for each, don't continue calculating.

Step 2: Evaluate Each Approach's Feasibility
For each approach above, judge whether it's "possible / unlikely / impossible" to reach 24, and give a one-sentence reason.

Step 3: Only Continue with Feasible Approaches
Select one approach marked as "possible" and continue to the next calculation step.

Minimal Universal Template

[Task Description]
Describe the final problem to solve here: {final goal}

[Step 1: Generate Candidate Approaches]
Please provide {N} different intermediate approaches.
Each approach only writes the current step, don't give complete answers.

[Step 2: Evaluate Approaches]
Please separately evaluate whether each approach is feasible (e.g.: feasible / uncertain / infeasible) and give brief reasons.

[Step 3: Continue Advancing]
Select only from approaches judged feasible and continue to the next step.
  1. ReAct Prompting (Reason + Act)

ReAct Prompting is a prompting method that alternates between thinking about problems (Reasoning) and taking action to look up information (Action). The core problem it solves is simple: models are trained on past data and don’t know the latest or specific real-world situations. If you only let it “imagine,” it easily gives inaccurate or fabricated answers when information is insufficient. ReAct’s approach is to have the model continuously cycle through “first think clearly what to look up → then look it up → see the results → then continue thinking” during a task.

It’s like going to a restaurant you’ve never been to before. You first think about which area it’s roughly in (reasoning), then open a map to search (action), see ratings and routes (observation), then decide which route to take (reason again). Here, the map and search results are the “external information” in ReAct.

In real systems, ReAct’s value is: it doesn’t require the model to give an answer from the start, but allows it to think while looking things up. This prevents the model from fabricating reasons when it doesn’t know facts, ensures every judgment step has basis, and makes it convenient for humans to review intermediate processes.

Why Use It?

Model output is linear. If it rushes toward the “final answer” from the start, it easily distorts earlier thinking to piece together conclusions. ReAct forcibly inserts “look it up first” steps, separating thinking and looking up, reducing blind guessing probability.

Three Most Common, Most Suitable Situations for ReAct:

First, Research-Type Questions (Across Multiple Fact Sources). Example: You ask “Which paper proposed ReAct? Who are the authors? What year was the paper published?” Without looking it up, the model might get authors, years wrong, or confuse similar methods. ReAct’s approach: first separately look up “ReAct paper” “author list” “publication date/year,” then combine results into the final answer after getting them.

Second, Tasks Requiring Step-by-Step Decisions (Each Step Depends on Feedback). Example: You want the model to help you “set up a ReAct Agent with LangChain and run it, with search + calculation tools.” If the model gives a complete solution at once, it easily misses key steps (like environment variables not configured, tool names wrong, version differences causing code not to run). ReAct’s approach: first check installation and version → then run minimal example → observe errors → fix based on errors → run next step, until successful.

Third, Questions Requiring Tools (Must Call External Capabilities for Answers). Example: You ask “What’s the weather like in Tokyo today? Is it suitable for outdoor activities?” or “Calculate 29 to the power of 0.23.” For the former, without checking real-time weather, the model can only guess; for the latter, without calculating, the model easily gets it wrong.

How to Use?

Incorrect One-Line Approach:

Help me judge whether this solution is feasible, and give conclusions and reasoning.

This approach has the model simultaneously draw conclusions and compose reasoning. If it doesn’t actually know key facts, it will likely first make a snap judgment, then find explanations that “look reasonable” in reverse.

More Stable Improvement (ReAct Approach): Step 1: Only have the model clarify what information is needed. Step 2: Have it look up this information. Step 3: Then draw conclusions based on what was found.

You will complete the task following these steps:

Thought: List what key information is needed to judge whether this solution is feasible.
Action: Look up or calculate this information item by item.
Observation: Record the results from each lookup or calculation.
Thought: Re-judge based on these results.
Final: Give the final conclusion and basis.

Minimal Universal Template

You will complete the task through "think → look up → see results":

Thought: Explain what information is still missing.
Action: Execute a clear query, search, or calculation.
Observation: Write down the result from this step.
Thought: Update judgment based on new information.

When information is sufficient:
Final: Output the final answer.
  1. Reflexion

Reflexion is having the model, after completing each task, summarize in one sentence what went wrong and how to improve next time, then continue the next attempt carrying this sentence. It doesn’t change model parameters (doesn’t train the model), but turns failure experience → written text → stored in memory → used as context next time.

Like writing a review note after doing a problem: why was this wrong, what to watch next time; the model also uses this “reminder” as memory to continue. Its function isn’t to make the model smarter, but to prevent the model from repeatedly making the same mistake, letting multiple attempts truly converge toward the correct direction.

Three Most Common Situations

Example: Writing Code to Pass Tests You have the model write a function to pass a test set. After the first failure, if you just “try again,” the model typically rewrites everything, with error points not fixed. When using Reflexion, the model will first clarify: was it boundary conditions not handled, or logic order wrong, then fix targeting that point.

Incorrect approach

Write a JavaScript function to determine if a string is a palindrome. If wrong, revise until correct.

Correct approach (Using Reflexion)

Task:
Write a JavaScript function to determine if a string is a palindrome.

Step 1: Execute
Give the function code directly.

Step 2: Check
In what situations would this implementation judge incorrectly? Explain in 1-2 sentences.

Step 3: Reminder
Summarize one sentence: "what must be noted next time when implementing."

Step 4: Redo
Rewrite the function while following this reminder.
  1. PAL (Program-Aided Language Models)

What is PAL PAL is an approach: have the model first write runnable code, then use code to calculate the answer, rather than having the model directly “think” the answer. It solves: when models handle arithmetic, dates, rules, they easily calculate wrong just by talking. The goal is simple—delegate error-prone calculations to programs.

It’s like doing math problems: don’t do mental arithmetic in your head, but first write the formula into a calculator.

  • Problem → Model understands
  • Algorithm → Model writes code
  • Result → Calculator gives

In systems, PAL makes key steps into runnable code, results can be reviewed and reproduced, not relying on “feeling right or wrong.”

Why Use It Large models generate the next word that sounds most human-like, not actually doing addition/subtraction, date progression, or rule execution. Results look right only because they “seem right,” not because they’re “calculated right.” For any problem that must be calculated accurately and must be checkable, you shouldn’t let the model complete the final step by “writing text.”

Three Common Scenarios

Scenario One: Arithmetic / Date Problems

Incorrect approach:

Today is 2023-02-27, I was born 25 years ago, what day is my birthday?

Model simultaneously understands time and calculates dates itself, easily miscalculates, and you can’t see how it calculated.

Improved approach (PAL):

Don't give the answer directly.
Please calculate using Python code:
Today is 2023-02-27, go back 25 years, output the date, format MM/DD/YYYY.

Scenario Two: Judgments with Many Rules and Conditions

Incorrect approach:

Based on the rules below, determine if this user can get a refund, and explain the reason.

When there are many rules, the model easily misses conditions; reasoning sounds right but may not actually comply with rules.

Improved approach (PAL):

Don't judge directly.
Please implement refund rule judgment logic in code,
input is user information, finally output true or false.

Scenario Three: Problems Requiring Verifiable Results

Incorrect approach:

Count the visit count for each user in the following logs.

Model might “look like it counted,” but you can’t confirm whether it actually counted everything.

Improved approach (PAL):

Don't give statistics directly.
Please write code:
Read the log data below, count each user's visit count, and output results.

III. LLM System Pattern Layer (Prompting as Interface)

  1. Retrieval Augmented Generation (RAG)

RAG is an approach: first look up information, then have the model answer questions based on what was found, rather than letting it answer randomly from memory. Like an open-book exam: the question is your problem, looking up the book is retrieval, writing answers while looking at the book is generation.

In systems, RAG’s essence is: answer sources are determined by the materials you provide, not the model’s own imagination.

Why Use It? When the model doesn’t know facts, it will still fabricate an answer that sounds true.

Three Most Common Situations

Scenario One: Rules Change Often, Latest Version Applies

Incorrect approach:

What conditions need to be met for membership upgrade now?

Model might answer according to old rules, sounding very confident, but information is already outdated.

Improved approach (RAG):

Below is the current effective membership rules content:
{{latest_rules}}

Please explain membership upgrade conditions based on these rules.
  • Fundamental Difference Between RAG and GKP Working Methods:
    • RAG: Question → Retrieve real documents → Concatenate context → Generate answer → Key is external grounding (external fact anchoring)
    • GKP: Question → Have model first generate “related knowledge list/explanation” → Then answer → Key is cognitive scaffolding GKP isn’t “learning new knowledge,” but “explicitly generating knowledge the model has already learned.”
  1. Automatic Reasoning and Tool-use (ART)

ART is a method that lets the model think while using tools where needed, rather than giving answers based on feeling all at once.

Like doing homework while being able to use a calculator or look up information anytime. Separating “thinking” and “doing,” clarifying which step needs calculation or lookup, makes results more stable and easier to modify.

Why Use It? When models encounter complex problems, they often don’t look up or calculate, directly fabricating conclusions. ART uses process to stop them. Once the model is required to “give conclusions directly,” it easily composes answers first, then supplements reasoning.

Three Most Common Situations

Scenario One: Problems That Must Be Calculated Accurately (Time / Quantity / Amount)

Incorrect approach:

It's 9:40 Tokyo time now, what time is it in New York?

Model might directly give time based on impression; timezone, daylight saving easily wrong, but looks “real.”

Improved approach (ART):

Please complete step by step:
1. Confirm Tokyo and New York timezones
2. Determine if there's daylight saving time
3. Calculate time difference
4. Give final time

Don’t let the model guess results directly; force it to first look up, then calculate, finally give answer.

Scenario Two: Multi-Step Decision / Planning Problems

Incorrect approach:

Help me plan a three-day Tokyo free trip, give the final itinerary directly.

Itinerary looks reasonable, but might have: roundabout routes, time conflicts, attractions closed, etc.

Improved approach (ART):

Please proceed step by step:
1. List available time each day
2. Look up opening hours and areas for each attraction
3. Group and arrange by area
4. Finally generate three-day itinerary

First think clearly about constraints and steps, then generate results—less likely to go off track.

Scenario Three: Problems Requiring External Information to Judge

Incorrect approach:

Is this API solution feasible? Give conclusion directly.

Model will give “professional-looking” judgment without knowing real interface limitations.

Improved approach (ART):

Please judge step by step:
1. Clarify what external conditions this solution depends on
2. Separately verify whether these conditions are met
3. Based on found information, then judge feasibility

First break down steps → look up when needed → then give conclusions

  1. Active-Prompt

Active-Prompt is an approach: first see which questions the model is most uncertain about, then only supplement examples for those questions, letting results gradually stabilize.

Ask the model the same question multiple times; if answers vary greatly, it means it doesn’t actually understand; then only pick these questions to supplement explanations, not explaining every question. It doesn’t fix examples from the start, but dynamically supplements examples based on where the model easily makes errors, letting the system not rely on luck.

Scenario One: Same Question, Different Results Each Time Asked

Incorrect approach:

Please give the final conclusion for the question below and explain the reasoning.

Model will first give a conclusion, then compose reasoning following the conclusion; each time composed differently.

Improved approach (Active-Prompt)

Please give 3 independent answers for the question below separately, don't explain.
Question:
{question}

Please point out where these answers are inconsistent.

First see if answers diverge, then decide whether to supplement examples, rather than trusting results from the start.

Active-Prompt isn’t teaching you “how to ask the model,” but teaching you “how to gradually stabilize a system that gives chaotic answers.” It’s closer to part of the system training process, not a user-level prompting technique.

  1. Automatic Prompt Engineer (APE)

APE is: don’t rely on humans blindly modifying Prompts, but have the model write several types of Prompts itself, run through them, see which one works best.

If you find: changing just one Prompt sentence makes results suddenly good or bad, and you can’t explain why, you should use APE. Can be used to test for the best prompt yourself. Doesn’t change model parameters, just how you talk to the model.

Minimal Universal Template (Ask Step by Step)

Step 1: Only Have It Generate Prompts
Generate 5 Prompts with different phrasings for the task below, don't execute the task.

Task:
[Write your task here]

Step 2: Execute Prompts One by One or in Batch
Use the Prompt below to complete the same input.

Prompt:
[Prompt #1]

Input:
[Same input]

Step 3: Only Compare and Select
Below are results from the same input under different Prompts.
Please select the one with the best effect according to [evaluation criteria] and explain why.

Evaluation Criteria:
- Whether key information is covered
- Whether off-topic
- Whether stable