RAG Failure Diagnostics Clinic

A framework-agnostic CLI clinic that classifies RAG pipeline bugs into 12 reusable failure patterns and suggests minimal structural fixes.

Added Apr 14, 2026

A small, framework-agnostic **RAG failure diagnostics clinic**.

You paste a real bug description from your LLM + RAG pipeline.
The script asks an LLM to classify the failure into one of several **reusable patterns**
and suggests a **minimal structural fix** (not just “add more context” or “try a better model”).

The goal is to show a pattern-driven way to debug RAG incidents that can be
adapted to any stack: LangChain, LlamaIndex, custom microservices, or in-house infra.

---

## What you will learn

By running this example, you will learn how to:

- Describe **real-world RAG bugs** in plain text so an LLM can reason about them.
- Use a small library of **failure patterns** to triage incidents quickly.
- Ask the model to propose **minimal structural changes** instead of pure prompt tweaks.
- Call an **OpenAI-compatible API** from a small Python script.
- Save each diagnosis into a JSON report for later analysis or post-mortems.

This is not a full framework.
It is a compact **clinic app** that demonstrates a pattern you can adapt in your own stacks.

---

## Folder structure

This tutorial expects the following files in `rag_tutorials/rag_failure_diagnostics_clinic`:

- `README.md` ← this file
- `rag_failure_diagnostics_clinic.py` ← minimal interactive CLI script
- `requirements.txt` ← Python dependencies

The script is completely self-contained.
All pattern definitions and prompts live inside this folder.

---

## Failure patterns (P01–P12)

The clinic uses a small, opinionated set of **12 reusable failure patterns**.
Each bug is mapped to exactly one primary pattern, with optional secondary candidates.

You can modify or extend these patterns to match your own production incidents.

| ID | Pattern name | Typical symptom |
| ---- | ----------------------------------------------------- | -------------------------------------------------------------- |
| P01 | Retrieval hallucination / grounding drift | Answer confidently contradicts retrieved documents. |
| P02 | Chunk boundary or segmentation bug | Relevant facts are split or truncated across chunks. |
| P03 | Embedding mismatch / semantic vs vector distance | Cosine similarity does not match true relevance. |
| P04 | Index skew or staleness | Old or missing data even though source of truth is updated. |
| P05 | Query rewriting or router misalignment | Router sends queries to the wrong tool or dataset. |
| P06 | Long-chain reasoning drift | Multi-step tasks gradually lose track of earlier constraints. |
| P07 | Tool-call misuse or ungrounded tools | Tools are called with wrong arguments or without grounding. |
| P08 | Session memory leak / missing context | Conversation loses important facts between turns or sessions. |
| P09 | Evaluation blind spots | System passes tests but fails on real incidents. |
| P10 | Startup ordering / dependency not ready | Services crash or 5xx during the first minutes after deploy. |
| P11 | Config or secrets drift across environments | Works locally, breaks only in staging / prod due to settings. |
| P12 | Multi-tenant / multi-agent interference | Requests or agents step on each other’s state or resources. |

The built-in examples roughly correspond to:

- Example 1 → retrieval hallucination / grounding drift (P01 style).
- Example 2 → startup ordering / dependency not ready (P10 style).
- Example 3 → config or secrets drift across environments (P11 style).

You are encouraged to replace these with your own incident snippets.

---

## How the clinic works

At a high level:

1. The script builds a **system prompt** that explains the 12 patterns above.
2. You pick one of three built-in examples or paste your own RAG / LLM bug description.
3. The model is asked to:
- Choose a **primary pattern ID** (P01–P12).
- Optionally choose up to **two secondary candidates**.
- Explain the reasoning in short bullet points.
- Propose a **minimal structural fix** (changes to retrieval, routing, eval, or infra).
4. The full answer is printed to the console and also saved into
`rag_failure_report.json` together with the original bug text and model name.

The intent is to show how a small **pattern vocabulary + prompt** can turn an LLM
into a lightweight helper for incident triage.

---

## Prerequisites

- Python 3.9 or newer.
- An API key for any **OpenAI-compatible** chat completion endpoint:
- For example, `OPENAI_API_KEY` for `https://api.openai.com/v1`.
- Or your own proxy URL set via `OPENAI_BASE_URL`.
- Basic familiarity with RAG pipelines, logs, and failure modes.

---

## Setup

From the root of the `awesome-llm-apps` repo:

```bash
cd rag_tutorials/rag_failure_diagnostics_clinic
pip install -r requirements.txt
````

Minimal `requirements.txt`:

```text
openai>=1.6.0
```

Set your API key as an environment variable (recommended):

```bash
export OPENAI_API_KEY="sk-..."
# optional, if you use a custom endpoint
# export OPENAI_BASE_URL="https://your-proxy.example.com/v1"
# export OPENAI_MODEL="gpt-4o-mini"
```

> Tip: If you prefer Colab, you can also copy the entire
> `rag_failure_diagnostics_clinic.py` file into a single Colab cell and run it there.

---

## Running the clinic

From inside `rag_tutorials/rag_failure_diagnostics_clinic`:

```bash
python rag_failure_diagnostics_clinic.py
```

You will see a simple text UI:

* If `OPENAI_API_KEY` is not set, the script will ask for an API key.
* You can keep the default base URL (`https://api.openai.com/v1`) and model (`gpt-4o`)
or override them.
* Then you choose:

* `1` → built-in retrieval hallucination example (P01 style).
* `2` → startup ordering example (P10 style).
* `3` → config / secrets drift example (P11 style).
* `p` → paste your own bug description.

Each run prints a diagnosis and writes a `rag_failure_report.json` file
containing the bug text, model settings, and assistant reply.

You can commit several reports into your own repo as a lightweight
**RAG incident library**.

---

## Extending this tutorial

Some ideas for extending this pattern:

* Replace the examples with anonymized incidents from your own logs.
* Add more patterns or split existing ones to match your stack.
* Emit a richer JSON schema (severity, owners, suspected components).
* Plug the reports into an evaluation dashboard or incident tracker.

#RAG #diagnostics #LLM #debugging #patterns

Send Feedback

RAG Failure Diagnostics Clinic

Send Feedback