My call volume is low, is caching even worth setting up?

Maybe not, and the kit tells you honestly: the break-even cost calculator weighs cache-write overhead against read savings before you commit. Caching pays off on repeated, high-volume calls sharing a static prefix; one-off prompts can cost more cached than uncached.

How does it actually get to 85-90% savings rather than a few percent?

It restructures prompts into a static prefix and dynamic suffix, then stratifies the cacheable part into four layers: system, tools, skill content, and user context, with cache_control breakpoints at each boundary. JSONL hit/miss telemetry then shows whether the cache is really being read, since twelve documented anti-patterns can silently kill hits.

Can I cache prompts that contain customer personal data?

No. A PII filter and a cross-tenant collision guard wrap the cache blocks by design, so personal data and one tenant's context never end up served to another. If a block fails the filter, it stays dynamic and uncached.

By email right after purchase: ready to run, downloaded instantly, no setup wait.

One-time or subscription?

A one-time purchase; no subscription or hidden fees. VAT (20%) is included.

As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.

Skill AI & LLM →

Prompt Caching Optimizer

a brand prompt caching API ile %85-90 token maliyeti azaltma stratejisi.

A complete discipline for cutting LLM input costs by 85-90% using the Anthropic prompt caching API, with four-layer cache stratification, cache_control breakpoint placement, hit/miss telemetry, and break-even cost analysis. It restructures prompts into static prefix and dynamic suffix so repeated system prompts, tool definitions, and skill content read from cache at a fraction of the cost. It also guards against the silent traps that quietly destroy cache hits and against caching personal data.

$15 one-time

Add to a kit →

Prices include 20% VAT. · Forged on real agency work · one-time, no lock-in

Type Skill
Category AI & LLM
Delivery Email · instant
License One-time

Run preview

forgehouse, prompt-caching-optimizer

Inside the run · no black box

See the actual work before you buy it.

Cutting LLM input spend by 85 to 90 percent is mostly an ordering problem. Prompts get stratified from stable to volatile, breakpoints placed, PII scrubbed, and every dispatch logged so savings are proven.

Measures whether caching is even worth it before touching anything: the static prefix must clear the 1024 token minimum (smaller breakpoints are silently ignored by the API) and the break-even calculator checks call frequency, because a 5-minute ephemeral cache pays for itself from the second request inside the TTL window.
Stratifies the prompt into 4 layers ordered strictly from most stable to most volatile: system prompt (changes yearly), tool definitions (weekly), skill or document content (daily), user context (per dispatch). Each layer boundary gets its own cache_control breakpoint, the API maximum of 4.
Enforces the coherence rule that makes or breaks hit rate: nothing dynamic leaks into the static prefix. A timestamp or random ID in the system prompt changes the fingerprint on every call and turns 90 percent savings into a 25 percent surcharge.
Scrubs PII before any block is cached: a regex guard strips national ID numbers, emails, phone numbers, IBANs, card numbers and API keys, and pins the tenant identifier ahead of the breakpoint so two customers can never collide on the same cache entry.
Logs every dispatch to JSONL telemetry from the API usage fields: cache write tokens, cache read tokens, uncached tokens, hit ratio and the dollar delta against a hypothetical no-cache run, so savings are measured rather than assumed.
Reviews the telemetry weekly and applies Pareto: templates whose 7-day hit ratio drops under 50 percent get flagged for prompt restructuring, and cache investment concentrates on the top handful of templates that carry most of the token volume.

Use cases · what happens when you plug it in

One power source. 6 lines out.

prompt-caching-optimizer · core

core active · 6 lines

Cutting input token cost on high-volume agent dispatches

✓ cutting input token cost
Caching long system prompts and tool definitions

✓ caching long system prom…
Speeding up report and digest pipelines with shared templates

✓ speeding up report and d…
RAG context caching for sequential queries

✓ rag context caching for
Deciding whether a given prompt is worth caching

✓ deciding whether a given
Privacy-safe caching that strips PII

✓ privacy-safe caching that

Benefits · what you walk away with

Yours to keep.

Drag time forward. Watch what stays.

Forever

That's what owning means.

The rented stack

ai writing tool: subscription

expired · access lost

analytics suite: subscription

expired · access lost

design platform: subscription

expired · access lost

(nothing left)

Your forge

Up to 90% lower input cost on cached reads
license: perpetual
Time-to-first-token cut to a fraction via cached reads
license: perpetual
Data-driven cache decisions from break-even math, not guesswork
license: perpetual
Cross-tenant leaks and PII caching blocked by design
license: perpetual

subscriptions expire · deeds don't

What's included · the full manifest

Everything in the box.

Pick a piece up. Watch it work.

Canonical cache_control header pattern for system, tools, and messages

part 01 of 06 · in the box

6 parts · one working system · ships instantly by email

From the field · a real case

This wasn’t written at a desk.

The problem

The fix

The result

Who it's for

This wasn't forged for everyone.

Not for you if you'd rather rent a tool than own one.
Not for you if you want someone else to run your stack.
Not for you if you're happy guessing.

Still here? Good.

AI engineers and platform owners running repeated, high-volume LLM calls who need to slash token spend and latency without breaking privacy.

then this was forged for you.

Works with

Universal by design: these run in any AI. Delivered in the open Agent Skills + MCP format (native in Claude); ChatGPT, Gemini, Cursor and Copilot adapt the same files their own way.

Claude Native format
ChatGPT Adapts via open standards
Gemini Adapts via open standards
Cursor Adapts via open standards
Copilot Adapts via open standards

Questions · still in the air

Catch what's on your mind.

the air is clear. nothing between you and the forge.

catch a spark: the forge will answer

My call volume is low, is caching even worth setting up?

Maybe not, and the kit tells you honestly: the break-even cost calculator weighs cache-write overhead against read savings before you commit. Caching pays off on repeated, high-volume calls sharing a static prefix; one-off prompts can cost more cached than uncached.
How does it actually get to 85-90% savings rather than a few percent?

It restructures prompts into a static prefix and dynamic suffix, then stratifies the cacheable part into four layers: system, tools, skill content, and user context, with cache_control breakpoints at each boundary. JSONL hit/miss telemetry then shows whether the cache is really being read, since twelve documented anti-patterns can silently kill hits.
Can I cache prompts that contain customer personal data?

No. A PII filter and a cross-tenant collision guard wrap the cache blocks by design, so personal data and one tenant's context never end up served to another. If a block fails the filter, it stays dynamic and uncached.
How is it delivered?

By email right after purchase: ready to run, downloaded instantly, no setup wait.
One-time or subscription?

A one-time purchase; no subscription or hidden fees. VAT (20%) is included.
Can I get a refund?

As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.

Prompt Caching Optimizer

See the actual work before you buy it.

One power source. 6 lines out.

Yours to keep.

The rented stack

Your forge

Everything in the box.

This wasn’t written at a desk.

This wasn't forged for everyone.

Works with

Catch what's on your mind.

My call volume is low, is caching even worth setting up?

How does it actually get to 85-90% savings rather than a few percent?

Can I cache prompts that contain customer personal data?

How is it delivered?

One-time or subscription?

Can I get a refund?

Related products

Agent Eval Suite Langsmith

Brain Context Engineering

Brain Memory Hybrid Search

Claude Agent Template Library