FORGE Context Efficiency
FORGE is an agentic loop: every tool call adds a new message to the conversation history. A story that takes 25 iterations accumulates 50+ messages — the original system prompt and user message plus all the tool calls and results in between. Without management, this history grows monotonically: every read_file result remains in context even after the file has been rewritten; every run_bash output is repeated for every retry of the same command.
This page describes the four stages PACE uses to bound that growth. For background on how PACE’s stateless architecture compares to monolithic sessions, see Token Efficiency by Design.
The three growth drivers
A Day-23 trace from a representative PACE sprint showed approximately 69,000 tokens in FORGE’s message history at the start of the implementation phase. Breakdown:
| Driver | Tokens | Share |
|---|---|---|
| Stale file reads (files already rewritten) | ~31,000 | 45% |
Repeated run_bash outputs (same command, multiple runs) | ~16,000 | 23% |
write_file echo (full file content returned in tool result) | ~15,000 | 22% |
| Signal content (acceptance criteria, live tool results) | ~7,000 | 10% |
90% of the history was noise. Stage 1 targets all three mechanically.
Stage 1 — Eviction, deduplication, suppression
Available from v3.1.0 · Always on · No configuration required
Stage 1 runs automatically before every LLM call in the FORGE loop. It applies three transformations to the message history:
Stale read eviction
When FORGE rewrites a file (write_file on path/to/file.py), all earlier read_file tool results for that path become stale — they describe a version of the file that no longer exists. Keeping them misleads FORGE into believing the old content is still current.
_evict_stale_reads(messages, written_paths) scans the history and replaces any tool_result for a read_file on a written path with a compact placeholder:
[evicted: pace/auth.py was read here but has since been written — see current version on disk]The placeholder preserves the message structure (so the conversation turn count is correct) but discards the stale token payload.
Bash output deduplication
FORGE often runs the same command multiple times during a story — most commonly the test command (pytest -v --tb=short) which it runs after every change to verify progress. Each run produces a full output block (stack traces, file paths, line numbers) that is identical or nearly identical to the previous run.
_dedup_bash_results(messages) keeps only the most recent run_bash result for each command signature (the command string itself). Earlier results are replaced with:
[dedup: pytest -v --tb=short — see most recent output later in history]Write receipt suppression
When FORGE calls write_file, the tool result previously echoed the full file content back into the conversation — doubling the token cost of every file write. From v3.1.0, the result is replaced with a compact receipt at dispatch time:
OK: wrote 1,247 bytes to pace/auth.py (iter 14)The file content is written to disk as normal; only the in-context representation is compressed.
Expected savings
Across the three drivers (stale reads, bash dedup, write suppression), Stage 1 reduces FORGE’s per-iteration input token growth by approximately 60–65% compared to the unmanaged baseline on a representative mid-sprint story.
Stage 2 — Haiku context compression
Available from v3.1.0 · Opt-in via
forge.compression_model
After the first confirmed RED-phase test failure (i.e., after FORGE calls confirm_red_phase), FORGE transitions from exploration to implementation. At this point, a significant portion of the exploration history — file reads, earlier test runs, decisions that were considered and rejected — has low signal value for the implementation phase.
_compress_history(messages, compression_model, written_paths, story_card) makes a single call to the lightweight compression model and asks it to produce a structured YAML summary of the exploration phase:
files_read: - pace/auth.py - tests/test_auth.pyfiles_written: []plan_committed: falsekey_decisions: - Use JWT with RS256 signing - Extend existing User model rather than creating AuthUserlast_test_output: "3 failed, 0 passed"red_phase_confirmed: trueThe original history is replaced with two messages: the initial user message (story card) and a single assistant-turn containing the compressed summary. This gives FORGE a clean starting context for implementation with full awareness of what was learned in exploration.
Five mitigations prevent compression from degrading quality:
- Schema validation — the YAML summary must contain all required fields; partial responses are rejected.
- Anti-hallucination override —
files_writtenin the summary is always replaced with the ground-truthwritten_pathsset tracked by the framework, regardless of what the model produced. - Single-trigger guard — compression fires exactly once per story; the
_compressedflag prevents a second call even if the loop iterates further. - Required-field verification — if the summary is missing
key_decisions,last_test_output, orred_phase_confirmed, the original history is preserved unchanged. - Failure fallback — if the compression API call raises any exception, FORGE continues with the original (uncompressed) history.
Configuration
forge: compression_model: claude-haiku-4-5-20251001 # defaults to llm.analysis_modelSet compression_model to a fast, cheap model. The compression call is non-critical (fallback preserves original history) so the smallest available model is appropriate.
If compression_model is not set, it defaults to llm.analysis_model. If llm.analysis_model is also not set, compression is skipped.
Expected savings
Stage 2 reduces the implementation-phase starting context by approximately 29% beyond what Stage 1 achieves, cutting the Day-23 baseline from ~69,000 tokens to ~20,000 at the implementation start.
Stage 3 — Pre-seeded file hints
Available from v3.1.0 · Enabled by default · Configurable
Before FORGE begins its exploration loop, a lightweight Haiku call scans engineering.md and identifies which files are most likely relevant to the story being implemented. These hints are injected into FORGE’s initial user message as a ## File Hints section:
## File Hints
The following files are likely relevant to this story (generated from engineering.md):
- `pace/auth.py` (confidence: 0.90) — auth logic; likely needs modification- `tests/test_auth.py` (confidence: 0.85) — existing auth tests; likely needs new casesHints are advisory, not constraints — FORGE uses its own judgement and may read additional files. The intent is to reduce the exploration phase from 12–15 iterations (FORGE reading its way to the relevant files) to 3–4 iterations (FORGE starting with the right files already surfaced).
Configuration
forge: file_hints_enabled: true # default file_hints_confidence_threshold: 0.7 # default; hints below this are excludedLower the threshold to include more candidate files; raise it to show only high-confidence hints.
Hints are skipped automatically when:
file_hints_enabled: falseengineering.mdis absent from.pace/context/engineering.mdis not tracked incontext.manifest.yaml(may be stale)- The pre-pass call fails (FORGE starts without hints; no error is raised)
Per-story override
For architectural stories that intentionally explore broadly (greenfield modules, structural refactors), pre-seeded hints can bias FORGE toward familiar paths. Disable them for a specific story in plan.yaml:
stories: - id: story-12 title: "Extract payment module into separate service" status: pending disable_file_hints: true acceptance_criteria: - "payments/ directory contains its own models, services, and tests" - "pytest exits 0"Expected savings
Stage 3 reduces the exploration phase from ~13 iterations to 3–4 on representative stories with a well-maintained engineering.md. This translates to approximately 10,000 fewer input tokens before the first GREEN test run.
Stage 4 — Forked subcontext (Phase A)
Available from v3.1.0 · Opt-in via
forge.fork_enabled· Phase A only
Stages 1–3 optimise the content of a single FORGE context. Stage 4 splits the story into two independent API contexts: one for exploration and one for implementation.
In Phase A, FORGE signals the split explicitly by calling the commit_plan tool. When commit_plan is called, FORGE provides:
- A free-text
plandescribing the implementation approach - A
files_to_modifylist of files it intends to write - Optionally, a compressed summary of the exploration phase via
_fork_context()
_fork_context() constructs a compact fresh context for the implementation phase:
- Compresses the exploration history to a YAML summary (same Haiku call as Stage 2)
- Prepends the original story card
- Appends the committed implementation plan as the final message
FORGE then starts the implementation phase with this compact context rather than the full exploration history — even if exploration ran for 20+ iterations.
Configuration
forge: fork_enabled: false # opt-in; default falsePhase A is safe to enable immediately. The single-trigger guard (_forked flag) ensures the fork happens at most once per story. If FORGE reaches complete_handoff without calling commit_plan, it behaves identically to the non-forked path — Stage 1–3 optimisations still apply.
Phase B and Phase C (planned)
- Phase B (v3.2, planned): Live dual-context — exploration and implementation run in genuinely separate API contexts with no shared state. Requires
commit_planto carry a full snapshot. - Phase C (v3.3, planned): Adaptive triggering — PACE detects when exploration has converged (based on iteration count or tool-call entropy) and triggers the fork automatically, without requiring FORGE to call
commit_plan.
Phase B and C will be enabled after validation on 10+ production stories with Phase A.
Expected savings
Stage 4 reduces the implementation-phase context by approximately 43% beyond Stages 1–3 on stories where exploration ran 15+ iterations before a plan was committed. The exact saving depends on exploration depth.
Cumulative effect
The four stages are independent and stack:
| Stage | Mechanism | Tokens saved (Day-23 baseline) | Cumulative reduction |
|---|---|---|---|
| 1 | Eviction + dedup + suppression | ~47,000 | ~68% |
| 2 | Haiku compression | ~20,000 | ~97% |
| 3 | Pre-seeded hints | reduces exploration phase | fewer tokens before GREEN |
| 4 | Forked subcontext | ~30,000 (implementation phase) | fresh implementation baseline |
Stage 1 is always on. Stages 2 and 3 are enabled when compression_model and file_hints_enabled are set (Stage 3 is on by default). Stage 4 requires explicit opt-in.