EngineeringMay 15, 2026· 10 min read

Context Engineering for Coding Agents: why a single CLAUDE.md isn't enough

Why one CLAUDE.md isn't enough on a real codebase, and the 3-layer structure we use at Maleus to feed agents the right instructions at the right moment.

Adrien MaretCTO

Context Engineering for Coding Agents: why a single CLAUDE.md isn't enough

Context Engineering is the art of feeding an agent the right information at the right moment so it codes correctly.

Today, most teams that tackle the subject dump all their instructions into one big CLAUDE.md at the repo root. It works for a prototype, it breaks down as soon as the codebase grows.

On a real application, you need 3 distinct levels of instructions, each triggered at a different point in the agent's workflow. Without this layering, you dilute the context so much that the LLM ends up ignoring the most important rules.

In this article, I walk through the 3 layers we use at Maleus to generate applications in near-autonomy.

Why 3 levels and not just one

An LLM's context is limited and every token dilutes the others.

Instructions have different scopes:

repo scope: tooling, project structure, environment, ports, scripts. That's the job of the root CLAUDE.md
domain scope: the business logic of a functional domain, flows, key files. That's the job of per-module AGENT.md
tech-stack scope: code conventions per framework / per file type (Drizzle, Nest.js, repository patterns). That's the job of coding rules

If you load everything at once, you get noise, contradictions between rules, and dilution.

The fix is to layer. Each layer is loaded when it's relevant.

Level 1 - Coding rules          (loaded on every file edit)
Level 2 - CLAUDE.md             (loaded once at session start)
Level 3 - Per-module AGENT.md   (loaded when the agent explores a domain)

The orders of magnitude at Maleus today:

$ cloc .claude/rules
40 files, 8,437 lines of markdown (~210 lines/rule)

$ find apps/studio -name "AGENT.md" | xargs cloc
97 files, 7,855 lines of markdown (~80 lines/AGENT.md)

$ cloc apps/studio --include-lang=TypeScript
1,843 TypeScript files, 209,180 lines of code

Which gives two interesting ratios:

1 line of doc-for-agent per ~13 lines of code
1 line of AGENT.md per ~27 lines of code in the module it describes

If we dumped these 16,000 lines into one big CLAUDE.md, we'd burn the equivalent of several tens of thousands of tokens on every prompt, 95% of it unrelated to the task at hand. Layering is what makes this context investment sustainable instead of unbearable.

Level 1: coding rules, triggered by pattern matching

This is the most granular layer and the one with the most impact on the quality of the generated code.

One rule per file type

At Maleus, every backend file is suffixed with its role:

<Entity>.table.ts for the Drizzle table definition
<Entity>.repository.ts for DB access
<Entity>.service.ts for business logic
<Entity>.controller.ts for Nest.js endpoints

Each suffix maps to a dedicated rule, triggered by a precise glob in the paths property of the frontmatter:

# .claude/rules/backend/backend-tables.md
name: "backend-tables"
paths: "**/{backend,studio-backend}/**/*.table.ts"

This glob-based split lets you load only the relevant rule at the moment the agent touches a file of that type.

Edit flow: deterministic pattern matching on the read

The agent always reads a file before editing it. The path passed to read serves as the key for pattern matching.

read("src/features/companies/data/Companies.repository.ts")
  -> matches "**/{backend,studio-backend}/**/*.repository.ts"
  -> backend-repositories.md injected into the context
edit("Companies.repository.ts", oldString, newString)
  -> the newString respects the rule we just injected

It's pattern matching, so 100% deterministic, with no LLM randomness.

Write flow: the problem of the file that doesn't exist yet

This mechanism falls apart when the agent creates a new file.

Why? Because it has no reason to read first, the file doesn't exist. It directly generates write("Foobar.table.ts", content), and the content is already in the buffer before we could pattern-match anything.

Pattern matching has a gap, then: for file creation, you need another mechanism.

Skills: LLM evaluation as a complement

Skills are instructions that the LLM decides on its own to load by evaluating their description.

The mechanism:

each skill has a description that explains when to load it
we explicitly include the pattern in the description: _"whenever you write a file matching \*\*/_.table.ts"*
we add CRITICAL INSTRUCTION YOU MUST ALWAYS to reinforce its priority

At our place, the rule (.claude/rules/backend/backend-tables.md) and the skill (.claude/skills/backend-tables/SKILL.md) are the same file via a hardlink. The frontmatter and the content are identical, only the triggering mechanism changes: pattern matching for the edit flow, LLM evaluation for the write flow.

The frontmatter therefore serves both channels at once:

---
name: "backend-tables"
paths: "**/{backend,studio-backend}/**/*.table.ts"
description: Contains guidelines for defining database tables using Drizzle
  ORM with PostgreSQL. [...] CRITICAL INSTRUCTION You MUST ALWAYS use this
  skill before editing/writing any file matching
  "**/{backend,studio-backend}/**/*.table.ts".
---

In practice the skill triggers almost every time, but it's never 100% since no deterministic mechanism can intercept a file creation before the content is generated.

CRITICAL INSTRUCTION, to be used sparingly

The CRITICAL INSTRUCTION YOU MUST ALWAYS keyword is a prioritization channel specific to Anthropic models that works by contrast. If you write it everywhere, nothing is a priority anymore.

Above all, writing CRITICAL INSTRUCTION is a sign that you need determinism, not prompt engineering. Prompt engineering increases the probability that an instruction is followed; determinism guarantees it. As soon as a deterministic mechanism is possible (pattern matching, hook, lint, test), it should be preferred. The skill is the acknowledged exception: nothing can intercept a file creation before content generation.

Building the rules empirically

No need to write all the coding rules up front. You end up with generic rules that never trigger, or worse, that contradict each other.

The approach that works is the opposite. You start with a few essential rules (backend structure, method naming, error format), then you watch the agent and on each recurring mistake, you enrich the corresponding rule.

You might as well let Claude Code write these additions: Anthropic models are fine-tuned for it. You give it the correction ("in this repository you should have used X instead of Y") and it updates the rule. We always reread, and we keep the rules short: a rule that's too long gets diluted in the context.

Every human correction becomes a rule that prevents the same mistake next time.

This three-layer approach is one of the key elements that let Maleus generate applications ready to deploy in production.

Level 2: CLAUDE.md, the project plumbing

A single CLAUDE.md file at the repo root, loaded once at the start of the session.

It answers three questions: where am I, what tools do I have, what's the environment.

Concretely, you find:

the type of environment (cloud sandbox, dev container, ...)
the available commands (lint, test, build)
the ports used (frontend 14001, backend 14002)
whether there's a database and how to access it
the useful npm scripts

Since this file is loaded for the whole session, you have to keep it short: every line stays in the context of every request.

Level 3: per-module AGENT.md, the business logic

Where the coding rules say how to write, the AGENT.md files say what this module does.

One file per functional domain

Inside each functional module, we put an AGENT.md file that contains the technical-functional info specific to the module.

These AGENT.md files are written by the agents themselves as they code, in a style optimized for an LLM that needs to understand fast rather than for human onboarding.

At Maleus today, we have 97 AGENT.md files spread across the whole application, from a few dozen to several hundred lines depending on the module's complexity.

Standard structure

The template we eventually converged on, generated and maintained by the agents:

# <Module> Module

## Overview

[2-4 sentences: the module's reason for being]

## Main Flows

[1 to 3 ASCII diagrams of the main flows]

## Important Files

[Table: file path | responsibility]

## Tables Structure

[Table per table: column | type | description]

## Permissions

[Table: API action | allowed roles | description]

## Error Codes

[Table: code | HTTP status | description]

## Dependencies

[List of the other modules / clients we depend on]

Not all sections appear in every module, only the ones that make sense. A module without a DB has no Tables Structure section. A module that doesn't consume AMQP events has no AMQP Events section.

Short example: features/agent-sessions/AGENT.md

As an illustration, here's a minimal AGENT.md (69 lines) covering an AMQP event routing module:

# Agent Sessions Module

## Overview

The Agent Sessions module owns the AMQP event consumption for
`snivel.agent-sessions.end` and `snivel.files.change` events. It routes
these events to the appropriate feature module (versions or workshops).
This module exists because the AmqpEventServer enforces one handler per
event pattern, and multiple feature modules need to react to the same events.

## Important Files

| File                                     | Description                                 |
| ---------------------------------------- | ------------------------------------------- |
| services/AgentSessionsRouting.service.ts | Event routing logic + Mutex                 |
| presentation/AgentSessions.consumer.ts   | AMQP consumer for snivel.agent-sessions.end |
| presentation/FilesChange.consumer.ts     | AMQP consumer for snivel.files.change       |

## Routing Logic

1. Distributed Mutex lock on agent-session:processing:{conversationId}
   prevents duplicate processing
2. If sessionOrigin is present, dispatch directly to the owning module
   (no DB lookups)
3. Fallback: try each handler sequentially until one claims the conversation

With 69 lines like these, an agent that receives a task such as "add a new AMQP event type" immediately knows:

that this module exists to route AMQP events between features
that there's a Mutex logic to respect to avoid double processing
that the file to touch is probably AgentSessionsRouting.service.ts plus a new consumer

Without this AGENT.md, the agent would have had to open 5 to 10 files to reconstruct that understanding.

AGENT.md as a codebase index

An AGENT.md plays the role of an index in the database sense: when an agent gets a task, it opens the ones for the potentially relevant modules and, in a few hundred tokens per module, it knows:

is this module relevant to my task?
if so, which files exactly should I open?

The agent no longer rummages through the codebase at random, it knows directly which files to open, which saves tokens and keeps its context clean for reasoning.

Benchmark: what AGENT.md files save

To quantify it, we ran a code-comprehension question in production on the monorepo, "how do images flow from the studio frontend to Claude Code in snivel?", twice, with and without access to the AGENT.md files.

Agent turns: 2 with AGENT.md vs 4 without. 2x more turns without.
Cumulative tokens: 2.1M with vs 4.3M without. +103% tokens without.
Cost: $1.53 with vs $2.28 without. +49% cost without.

Same final conclusion in both cases. But without the index, the agent doubles its token consumption and pays 50% more just to reconstruct what the AGENT.md files would have told it in a few lines.

At the scale of a team running thousands of requests a day, that's the difference between a sustainable API bill and a budget that spirals.

The rule that creates the rule

And it's a coding rule that maintains these AGENT.md files, taken from backend-architecture.md:

**Module AGENT.md**

Each module MUST contain a AGENT.md file at the root.

This file contains a high level description of the module purpose and
general behavior. It can contain anything necessary so a Senior developer
is happy to read it to understand how to handle the module:

- schema of main flow
- important files
- tables structure
- particular permissions
- important API actions
- important business logic

**You MUST ALWAYS update the AGENT.md file associated with a module
after working on it.**

This rule is loaded on every backend file edit, because its paths: is **/{backend,studio-backend}/**/*, a generic glob that matches every file in the backend. So every time an agent touches a module file, it has the instruction to update the AGENT.md right in its context.

Layer 1 (coding rules) thus maintains layer 3 (AGENT.md), without a human ever touching the docs.

Conclusion

Three layers, three loading moments, each with a triggering mechanism suited to it.

Without this layering, you hit a ceiling fast as soon as the codebase grows. You end up putting so many instructions in the same place that none of them is followed properly.

It works because it respects how the LLM's attention actually works: the info has to be close to the moment of generation, and not lost 20,000 tokens earlier in a giant CLAUDE.md.

The AGENT.md files are themselves generated and maintained by the coding agents, which follow a rule loaded on every backend file edit, so the system maintains itself with no human intervention on the documentation.

At Maleus, this is what lets us generate the majority of the code in near-autonomy without sacrificing quality.

See how Maleus can accelerate your team.