Financial RAG Agent Optimization:Methods, Cases, and Data

Sun, 10 May 2026 00:00:00 +0000

This project details the refinement of an agentic RAG system for financial Q&A, boosting test accuracy from 0.871 to ~0.919 by systematically diagnosing 18 failure cases. Rather than blind model tuning, the author prioritized a "diagnose-first" approach: resolving "judge-side" discrepancies with deterministic numeric prefiltering, then implementing structural improvements like query translation, anti-refusal checks, and a five-layer fix for superlative ambiguities. The results highlight that while prompt-based reflection is helpful, structural, schema-enforced changes offer superior reliability. Ultimately, the author demonstrates engineering pragmatism by consciously leaving eight failures unfixed—due to dataset noise or unfavorable ROI—distinguishing between "fixing everything" and strategic, production-oriented optimization.

Picking Evaluation Metrics for a RAG Agent — Notes from the Trenches

Tue, 05 May 2026 00:00:00 +0000

This article outlines a pragmatic, tiered approach to evaluating Retrieval-Augmented Generation (RAG) agents, specifically within the context of complex financial document analysis (FinanceBench). The author argues that effective evaluation is not about maximizing the number of metrics, but about selecting signals that provide clear, actionable insights at different stages of the development lifecycle.

RAG on Yuan's Blog

Financial RAG Agent Optimization:Methods, Cases, and Data

Picking Evaluation Metrics for a RAG Agent — Notes from the Trenches