Skill AI & LLM →

Brain Context Engineering

Engineer what goes into an AI agent's context window: how much, in what order, and how compressed.

A discipline for engineering what goes into an AI agent's context window: how much (token budget), in what order (relevance times recency), and how compressed (prompt caching plus sliding-window summarization). It treats the window as a scarce resource, fights the lost-in-the-middle effect by placing the most critical facts at the start and end, and stops context pollution where irrelevant chunks confuse the model. The result is an agent that recalls the right facts, answers on topic, and costs far less to run.

$15 one-time
Add to a kit →

Prices include 20% VAT. · Forged on real agency work · one-time, no lock-in

  • Type Skill
  • Category AI & LLM
  • Delivery Email · instant
  • License One-time
Run preview
forgehouse, brain-context-engineering

Inside the run · no black box

See the actual work before you buy it.

Context engineering treats the window as a budget, not a bucket. The work follows a fixed order, from measuring what you have to placing it where the model actually reads it:

  1. Sets a token budget and splits it across the system prompt, retrieved context and the live instruction, so nothing silently overflows the window
  2. Scores candidate chunks by relevance to the task multiplied by recency, then keeps only the top handful that earn their place
  3. Places the most critical facts at the very start and end of the context, where models attend most, to beat the lost-in-the-middle effect
  4. Compresses long histories with sliding-window summarization, keeping the gist of old turns instead of every word
  5. Marks the stable prefix for prompt caching so reused context is paid for once rather than on every call, and flags any irrelevant blocks for removal
Use cases · what happens when you plug it in

One power source. 6 lines out.

brain-context-engineering · core

core active · 6 lines

  1. Designing what an agent loads at session start so it recalls the right history

    ✓ designing what an agent
  2. Choosing the top relevant chunks for a RAG system under a token budget

    ✓ choosing the top relevant
  3. Compressing 100-plus-turn conversations with sliding-window summarization

    ✓ compressing 100-plus-tur…
  4. Cutting model cost by caching the stable system prompt and reused blocks

    ✓ cutting model cost by ca…
  5. Debugging an agent that gives off-topic answers caused by context pollution

    ✓ debugging an agent that
  6. Merging multiple context sources (retrieval, profile, recent activity, feedback) cleanly

    ✓ merging multiple context
Benefits · what you walk away with

Yours to keep.

Drag time forward. Watch what stays.

Forever

That's what owning means.

The rented stack

ai writing tool: subscription

expired · access lost

analytics suite: subscription

expired · access lost

design platform: subscription

expired · access lost

(nothing left)

Your forge

  1. Sharper, on-topic answers because the model sees the right facts, not noise

    license: perpetual
  2. Lower running cost by caching reused prompt prefixes instead of resending them

    license: perpetual
  3. Fewer lost-in-the-middle misses on long inputs through deliberate placement

    license: perpetual
  4. Predictable token budgets so context never silently overflows the window

    license: perpetual

subscriptions expire · deeds don't

What's included · the full manifest

Everything in the box.

Pick a piece up. Watch it work.

A token-budget allocation model splitting the window across system, context and instruction

part 01 of 06 · in the box

6 parts · one working system · ships instantly by email

Who it's for

This wasn't forged for everyone.

  • Not for you if you'd rather rent a tool than own one.
  • Not for you if you want someone else to run your stack.
  • Not for you if you're happy guessing.
Still here? Good.

Engineers building AI agents and RAG systems who need accurate recall, controlled token budgets and lower per-call cost.

then this was forged for you.

Works with

Universal by design: these run in any AI. Delivered in the open Agent Skills + MCP format (native in Claude); ChatGPT, Gemini, Cursor and Copilot adapt the same files their own way.

  • Claude Native format
  • ChatGPT Adapts via open standards
  • Gemini Adapts via open standards
  • Cursor Adapts via open standards
  • Copilot Adapts via open standards
Questions · still in the air

Catch what's on your mind.

the air is clear. nothing between you and the forge.
catch a spark: the forge will answer

  1. Will this work with any LLM, or just one provider?

    The principles are model-agnostic: token budgets, relevance ranking and placement apply to any LLM with a context window. Provider-specific features like prompt caching are noted where they exist, but the core method does not depend on one vendor.

  2. Isn't a bigger context window enough on its own?

    No. Even large windows suffer the lost-in-the-middle effect, where information buried in the center is ignored, and every extra token costs money and latency. Choosing and placing the right context beats simply stuffing more in.

  3. Does prompt caching change the model's answers?

    No, caching only reuses an identical prompt prefix to cut cost and latency; the content the model sees is the same. The savings come from not resending the stable part on every call.

  4. How is it delivered?

    By email right after purchase: ready to run, downloaded instantly, no setup wait.

  5. One-time or subscription?

    A one-time purchase; no subscription or hidden fees. VAT (20%) is included.

  6. Can I get a refund?

    As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.