Yuan's Blog
EN

How to use Claude to generate higher-quality code

Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. The results in both cases are a mess that completely falls apart for anything non-trivial.

This skill enforces a disciplined pipeline that separates thinking from typing:

Research -> Plan -> Annotate (repeat 1-6x) -> Todo List -> Implement -> Feedback

If you want to generate higher-quality code that aligns with your goals, you should use this skill.

Now, to see how this actually performs in the wild, I ran a few tests. Here’s a breakdown of the results and what they mean.

Experiments

Three tasks were run on the same codebase, comparing with-skill vs. without-skill.

  • Eval 1: Add comprehensive error handling (cross-cutting, 5+ files)
  • Eval 2: Add caching for expensive operations (moderate, 2 files)
  • Eval 3: Fix flaky tests (debugging, 1-2 files)

Code Quality

MetricEval 1Eval 2Eval 3
Read:Write RatioWith7.04.34.5
Without1.01.43.5
Exploration BreadthWith14139
Without1356
RevertsWith000
Without000
AssertionsWith5/55/55/5

Read:Write Ratio is how many files the agent read versus how many it changed. Think of it as “how much did it study before it started writing?” With the skill, the ratio is consistently 4-7x — it reads a lot before touching anything. Without the skill, Eval 1 hit a 1:1 ratio, meaning the agent was editing files the moment it started reading them. That’s like writing code before you understand the system.

Exploration Breadth counts how many Read/Glob/Grep calls happened before any code was written. Look at Eval 2 — the with-skill agent made 13 exploration calls, the without-skill agent only made 5. That means the without-skill agent skipped checking whether caching patterns already existed in the codebase. It got lucky there were none, but it never actually verified that.

Reverts counts how many times the agent had to undo its own work. Zero across the board — the skill got it right on the first pass every time.

Assertions are task-specific quality checks like “did it research before planning?” and “did it build on existing patterns instead of duplicating them?” 15 out of 15 passed.

Process Efficiency

MetricEval 1Eval 2Eval 3
TurnsWith192322
Without371821
Incremental EditsWith000
Without1152
Total TokensWith279,745483,916387,964
Without1,091,610311,806433,598

Turns is how many back-and-forth cycles the agent needed. In Eval 1, the without-skill agent took 37 turns — it kept going back and forth, trying things, fixing things, trying again. The with-skill agent did it in 19 because the research phase prevented wrong directions in the first place.

Incremental Edits is the big one. This counts how many separate Edit calls the agent made to source files. With the skill, it’s zero every time — no code gets written until the plan is approved, then everything goes in as one coordinated pass. Without the skill, Eval 1 saw 11 incremental edits. That’s the agent patching code piece by piece, trial-and-error style, building up the implementation through repeated small fixes instead of getting it right from a plan.

Total Tokens is the raw cost. Eval 1 tells the clearest story: deep thinking before coding saved 74% of tokens because it avoided all that rework. Eval 2 went the other way — the skill used more tokens because it explored more thoroughly. But those extra tokens went to checking whether caching already existed, which is exactly the kind of check you want before introducing new patterns into a codebase.


Usage

Install

# Clone the repo to a temp location
git clone https://github.com/meatballG1210/code-flow.git

# Create the skills directory in your project
mkdir -p .claude/skills

# move to the skill folder
mv code-flow .claude/skills/

That’s it. Claude Code picks up skills automatically from .claude/skills/.

How to Use

The skill triggers automatically when you ask for multi-file changes, architectural refactors, new features, performance work, or anything that could conflict with existing codebase patterns. Just describe your task normally:

Add caching to improve performance

You can also trigger it explicitly with /code-flow:

/code-flow Refactor the error handling system

Either way, the skill walks you through each phase — research, plan, annotate, todo, implement, feedback. It stops at each gate and waits for your review before moving on. Just follow the prompts.