Will this work with any LLM, or just one provider?

The principles are model-agnostic: token budgets, relevance ranking and placement apply to any LLM with a context window. Provider-specific features like prompt caching are noted where they exist, but the core method does not depend on one vendor.

Isn't a bigger context window enough on its own?

No. Even large windows suffer the lost-in-the-middle effect, where information buried in the center is ignored, and every extra token costs money and latency. Choosing and placing the right context beats simply stuffing more in.

Does prompt caching change the model's answers?

No, caching only reuses an identical prompt prefix to cut cost and latency; the content the model sees is the same. The savings come from not resending the stable part on every call.

By email right after purchase: ready to run, downloaded instantly, no setup wait.

One-time or subscription?

A one-time purchase; no subscription or hidden fees. VAT (20%) is included.

As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.

Skill AI & LLM →

Brain Context Engineering

Engineer what goes into an AI agent's context window: how much, in what order, and how compressed.

A discipline for engineering what goes into an AI agent's context window: how much (token budget), in what order (relevance times recency), and how compressed (prompt caching plus sliding-window summarization). It treats the window as a scarce resource, fights the lost-in-the-middle effect by placing the most critical facts at the start and end, and stops context pollution where irrelevant chunks confuse the model. The result is an agent that recalls the right facts, answers on topic, and costs far less to run.

$15 one-time

Add to a kit →

Prices include 20% VAT. · Forged on real agency work · one-time, no lock-in

Type Skill
Category AI & LLM
Delivery Email · instant
License One-time

Run preview

forgehouse, brain-context-engineering

Inside the run · no black box

See the actual work before you buy it.

Context engineering treats the window as a budget, not a bucket. The work follows a fixed order, from measuring what you have to placing it where the model actually reads it:

Sets a token budget and splits it across the system prompt, retrieved context and the live instruction, so nothing silently overflows the window
Scores candidate chunks by relevance to the task multiplied by recency, then keeps only the top handful that earn their place
Places the most critical facts at the very start and end of the context, where models attend most, to beat the lost-in-the-middle effect
Compresses long histories with sliding-window summarization, keeping the gist of old turns instead of every word
Marks the stable prefix for prompt caching so reused context is paid for once rather than on every call, and flags any irrelevant blocks for removal

Use cases · what happens when you plug it in

One power source. 6 lines out.

brain-context-engineering · core

core active · 6 lines

Designing what an agent loads at session start so it recalls the right history

✓ designing what an agent
Choosing the top relevant chunks for a RAG system under a token budget

✓ choosing the top relevant
Compressing 100-plus-turn conversations with sliding-window summarization

✓ compressing 100-plus-tur…
Cutting model cost by caching the stable system prompt and reused blocks

✓ cutting model cost by ca…
Debugging an agent that gives off-topic answers caused by context pollution

✓ debugging an agent that
Merging multiple context sources (retrieval, profile, recent activity, feedback) cleanly

✓ merging multiple context

Benefits · what you walk away with

Yours to keep.

Drag time forward. Watch what stays.

Forever

That's what owning means.

The rented stack

ai writing tool: subscription

expired · access lost

analytics suite: subscription

expired · access lost

design platform: subscription

expired · access lost

(nothing left)

Your forge

Sharper, on-topic answers because the model sees the right facts, not noise
license: perpetual
Lower running cost by caching reused prompt prefixes instead of resending them
license: perpetual
Fewer lost-in-the-middle misses on long inputs through deliberate placement
license: perpetual
Predictable token budgets so context never silently overflows the window
license: perpetual

subscriptions expire · deeds don't

What's included · the full manifest

Everything in the box.

Pick a piece up. Watch it work.

A token-budget allocation model splitting the window across system, context and instruction

part 01 of 06 · in the box

6 parts · one working system · ships instantly by email

Who it's for

This wasn't forged for everyone.

Not for you if you'd rather rent a tool than own one.
Not for you if you want someone else to run your stack.
Not for you if you're happy guessing.

Still here? Good.

Engineers building AI agents and RAG systems who need accurate recall, controlled token budgets and lower per-call cost.

then this was forged for you.

Works with

Universal by design: these run in any AI. Delivered in the open Agent Skills + MCP format (native in Claude); ChatGPT, Gemini, Cursor and Copilot adapt the same files their own way.

Claude Native format
ChatGPT Adapts via open standards
Gemini Adapts via open standards
Cursor Adapts via open standards
Copilot Adapts via open standards

Questions · still in the air

Catch what's on your mind.

the air is clear. nothing between you and the forge.

catch a spark: the forge will answer

Will this work with any LLM, or just one provider?

The principles are model-agnostic: token budgets, relevance ranking and placement apply to any LLM with a context window. Provider-specific features like prompt caching are noted where they exist, but the core method does not depend on one vendor.
Isn't a bigger context window enough on its own?

No. Even large windows suffer the lost-in-the-middle effect, where information buried in the center is ignored, and every extra token costs money and latency. Choosing and placing the right context beats simply stuffing more in.
Does prompt caching change the model's answers?

No, caching only reuses an identical prompt prefix to cut cost and latency; the content the model sees is the same. The savings come from not resending the stable part on every call.
How is it delivered?

By email right after purchase: ready to run, downloaded instantly, no setup wait.
One-time or subscription?

A one-time purchase; no subscription or hidden fees. VAT (20%) is included.
Can I get a refund?

As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.

Brain Context Engineering

See the actual work before you buy it.

One power source. 6 lines out.

Yours to keep.

The rented stack

Your forge

Everything in the box.

This wasn't forged for everyone.

Works with

Catch what's on your mind.

Will this work with any LLM, or just one provider?

Isn't a bigger context window enough on its own?

Does prompt caching change the model's answers?

How is it delivered?

One-time or subscription?

Can I get a refund?

Related products

Agent Eval Suite Langsmith

Brain Memory Hybrid Search

Claude Agent Template Library

Context Driven Development