FORGE Context Efficiency

FORGE is an agentic loop: every tool call adds a new message to the conversation history. A story that takes 25 iterations accumulates 50+ messages — the original system prompt and user message plus all the tool calls and results in between. Without management, this history grows monotonically: every read_file result remains in context even after the file has been rewritten; every run_bash output is repeated for every retry of the same command.

This page describes the four stages PACE uses to bound that growth. For background on how PACE’s stateless architecture compares to monolithic sessions, see Token Efficiency by Design.

The three growth drivers

A Day-23 trace from a representative PACE sprint showed approximately 69,000 tokens in FORGE’s message history at the start of the implementation phase. Breakdown:

Driver	Tokens	Share
Stale file reads (files already rewritten)	~31,000	45%
Repeated `run_bash` outputs (same command, multiple runs)	~16,000	23%
`write_file` echo (full file content returned in tool result)	~15,000	22%
Signal content (acceptance criteria, live tool results)	~7,000	10%

90% of the history was noise. Stage 1 targets all three mechanically.

Stage 1 — Eviction, deduplication, suppression

Available from v3.1.0 · Always on · No configuration required

Stage 1 runs automatically before every LLM call in the FORGE loop. It applies three transformations to the message history:

Stale read eviction

When FORGE rewrites a file (write_file on path/to/file.py), all earlier read_file tool results for that path become stale — they describe a version of the file that no longer exists. Keeping them misleads FORGE into believing the old content is still current.

_evict_stale_reads(messages, written_paths) scans the history and replaces any tool_result for a read_file on a written path with a compact placeholder:

[evicted: pace/auth.py was read here but has since been written — see current version on disk]

The placeholder preserves the message structure (so the conversation turn count is correct) but discards the stale token payload.

Bash output deduplication

FORGE often runs the same command multiple times during a story — most commonly the test command (pytest -v --tb=short) which it runs after every change to verify progress. Each run produces a full output block (stack traces, file paths, line numbers) that is identical or nearly identical to the previous run.

_dedup_bash_results(messages) keeps only the most recent run_bash result for each command signature (the command string itself). Earlier results are replaced with:

[dedup: pytest -v --tb=short — see most recent output later in history]

Write receipt suppression

When FORGE calls write_file, the tool result previously echoed the full file content back into the conversation — doubling the token cost of every file write. From v3.1.0, the result is replaced with a compact receipt at dispatch time:

OK: wrote 1,247 bytes to pace/auth.py (iter 14)

The file content is written to disk as normal; only the in-context representation is compressed.

Expected savings

Across the three drivers (stale reads, bash dedup, write suppression), Stage 1 reduces FORGE’s per-iteration input token growth by approximately 60–65% compared to the unmanaged baseline on a representative mid-sprint story.

Stage 2 — Haiku context compression

Available from v3.1.0 · Opt-in via forge.compression_model

After the first confirmed RED-phase test failure (i.e., after FORGE calls confirm_red_phase), FORGE transitions from exploration to implementation. At this point, a significant portion of the exploration history — file reads, earlier test runs, decisions that were considered and rejected — has low signal value for the implementation phase.

_compress_history(messages, compression_model, written_paths, story_card) makes a single call to the lightweight compression model and asks it to produce a structured YAML summary of the exploration phase:

files_read:
  - pace/auth.py
  - tests/test_auth.py
files_written: []
plan_committed: false
key_decisions:
  - Use JWT with RS256 signing
  - Extend existing User model rather than creating AuthUser
last_test_output: "3 failed, 0 passed"
red_phase_confirmed: true

The original history is replaced with two messages: the initial user message (story card) and a single assistant-turn containing the compressed summary. This gives FORGE a clean starting context for implementation with full awareness of what was learned in exploration.

Five mitigations prevent compression from degrading quality:

Schema validation — the YAML summary must contain all required fields; partial responses are rejected.
Anti-hallucination override — files_written in the summary is always replaced with the ground-truth written_paths set tracked by the framework, regardless of what the model produced.
Single-trigger guard — compression fires exactly once per story; the _compressed flag prevents a second call even if the loop iterates further.
Required-field verification — if the summary is missing key_decisions, last_test_output, or red_phase_confirmed, the original history is preserved unchanged.
Failure fallback — if the compression API call raises any exception, FORGE continues with the original (uncompressed) history.

Configuration

forge:
  compression_model: claude-haiku-4-5-20251001  # defaults to llm.analysis_model

Set compression_model to a fast, cheap model. The compression call is non-critical (fallback preserves original history) so the smallest available model is appropriate.

If compression_model is not set, it defaults to llm.analysis_model. If llm.analysis_model is also not set, compression is skipped.

Expected savings

Stage 2 reduces the implementation-phase starting context by approximately 29% beyond what Stage 1 achieves, cutting the Day-23 baseline from ~69,000 tokens to ~20,000 at the implementation start.

Stage 3 — Pre-seeded file hints

Available from v3.1.0 · Enabled by default · Configurable

Before FORGE begins its exploration loop, a lightweight Haiku call scans engineering.md and identifies which files are most likely relevant to the story being implemented. These hints are injected into FORGE’s initial user message as a ## File Hints section:

## File Hints

The following files are likely relevant to this story (generated from engineering.md):

- `pace/auth.py` (confidence: 0.90) — auth logic; likely needs modification
- `tests/test_auth.py` (confidence: 0.85) — existing auth tests; likely needs new cases

Hints are advisory, not constraints — FORGE uses its own judgement and may read additional files. The intent is to reduce the exploration phase from 12–15 iterations (FORGE reading its way to the relevant files) to 3–4 iterations (FORGE starting with the right files already surfaced).

Configuration

forge:
  file_hints_enabled: true            # default
  file_hints_confidence_threshold: 0.7  # default; hints below this are excluded

Lower the threshold to include more candidate files; raise it to show only high-confidence hints.

Hints are skipped automatically when:

file_hints_enabled: false
engineering.md is absent from .pace/context/
engineering.md is not tracked in context.manifest.yaml (may be stale)
The pre-pass call fails (FORGE starts without hints; no error is raised)

Per-story override

For architectural stories that intentionally explore broadly (greenfield modules, structural refactors), pre-seeded hints can bias FORGE toward familiar paths. Disable them for a specific story in plan.yaml:

stories:
  - id: story-12
    title: "Extract payment module into separate service"
    status: pending
    disable_file_hints: true
    acceptance_criteria:
      - "payments/ directory contains its own models, services, and tests"
      - "pytest exits 0"

Expected savings

Stage 3 reduces the exploration phase from ~13 iterations to 3–4 on representative stories with a well-maintained engineering.md. This translates to approximately 10,000 fewer input tokens before the first GREEN test run.

Stage 4 — Forked subcontext (Phase A)

Available from v3.1.0 · Opt-in via forge.fork_enabled · Phase A only

Stages 1–3 optimise the content of a single FORGE context. Stage 4 splits the story into two independent API contexts: one for exploration and one for implementation.

In Phase A, FORGE signals the split explicitly by calling the commit_plan tool. When commit_plan is called, FORGE provides:

A free-text plan describing the implementation approach
A files_to_modify list of files it intends to write
Optionally, a compressed summary of the exploration phase via _fork_context()

_fork_context() constructs a compact fresh context for the implementation phase:

Compresses the exploration history to a YAML summary (same Haiku call as Stage 2)
Prepends the original story card
Appends the committed implementation plan as the final message

FORGE then starts the implementation phase with this compact context rather than the full exploration history — even if exploration ran for 20+ iterations.

Configuration

forge:
  fork_enabled: false  # opt-in; default false

Phase A is safe to enable immediately. The single-trigger guard (_forked flag) ensures the fork happens at most once per story. If FORGE reaches complete_handoff without calling commit_plan, it behaves identically to the non-forked path — Stage 1–3 optimisations still apply.

Phase B and Phase C (planned)

Phase B (v3.2, planned): Live dual-context — exploration and implementation run in genuinely separate API contexts with no shared state. Requires commit_plan to carry a full snapshot.
Phase C (v3.3, planned): Adaptive triggering — PACE detects when exploration has converged (based on iteration count or tool-call entropy) and triggers the fork automatically, without requiring FORGE to call commit_plan.

Phase B and C will be enabled after validation on 10+ production stories with Phase A.

Expected savings

Stage 4 reduces the implementation-phase context by approximately 43% beyond Stages 1–3 on stories where exploration ran 15+ iterations before a plan was committed. The exact saving depends on exploration depth.

Cumulative effect

The four stages are independent and stack:

Stage	Mechanism	Tokens saved (Day-23 baseline)	Cumulative reduction
1	Eviction + dedup + suppression	~47,000	~68%
2	Haiku compression	~20,000	~97%
3	Pre-seeded hints	reduces exploration phase	fewer tokens before GREEN
4	Forked subcontext	~30,000 (implementation phase)	fresh implementation baseline

Stage 1 is always on. Stages 2 and 3 are enabled when compression_model and file_hints_enabled are set (Stage 3 is on by default). Stage 4 requires explicit opt-in.