Airflow DAG Patterns
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and…
Forged from real client work, proof attached. Pick a piece or take the whole system.
Browse the full catalog → Browse ready-made kits → Build your own set →Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning.
A production playbook for making slow Apache Spark jobs fast and cheap. It attacks the real bottlenecks: shuffle, data skew, partition sizing, and memory pressure: with concrete PySpark patterns, broadcast and bucket join strategies, and an AQE-enabled configuration template so your pipelines scale without exploding cluster costs.
Prices include 20% VAT. · Forged on real agency work · one-time, no lock-in
Inside the run · no black box
The diagnosis order the skill follows on a slow Spark job, most expensive cost first:
spark-optimization · core
core active · 6 lines
Speed up slow Spark jobs and ETL pipelines
Diagnose data skew dominating job runtime
Right-size partitions to 128-256MB
Choose broadcast vs sort-merge vs bucket joins
Tune executor memory to stop OOM and spills
Read EXPLAIN plans to find full scans
Drag time forward. Watch what stays.
Forever
That's what owning means.
ai writing tool: subscription
expired · access lostanalytics suite: subscription
expired · access lostdesign platform: subscription
expired · access lost(nothing left)
Cut runtime by minimizing the most expensive operation: shuffle
license: perpetualLower cluster spend with auto-scaling and right-sizing
license: perpetualStop one skewed partition from holding up the whole job
license: perpetualRead 10-100x less I/O with columnar formats and pushdown
license: perpetualsubscriptions expire · deeds don't
Pick a piece up. Watch it work.
AQE-enabled optimized SparkSession config template
6 parts · one working system · ships instantly by email
Data engineers running Spark pipelines who need slow jobs to run fast, scale to large datasets, and stay within cluster budget.
then this was forged for you.Universal by design: these run in any AI. Delivered in the open Agent Skills + MCP format (native in Claude); ChatGPT, Gemini, Cursor and Copilot adapt the same files their own way.
The patterns are engine-level, not vendor-level: shuffle minimization, 128-256MB partition sizing, join strategy selection, and executor memory breakdown work wherever Spark runs. The code examples are PySpark, and the AQE-enabled SparkSession config template drops into any environment that lets you set Spark configs.
AQE handles moderate skew and partition coalescing automatically, but it will not pick broadcast vs bucket joins for you, salt a severely skewed key, or explain why a stage spills to disk. The playbook covers the decisions AQE cannot make, including manual salting and reading EXPLAIN plans to find full scans.
No. It is a set of patterns, a config template, and skew-detection monitoring snippets, not an agent that rewrites your pipelines. You still read your own stage metrics, identify the bottleneck, and apply the matching pattern yourself.
By email right after purchase: ready to run, downloaded instantly, no setup wait.
A one-time purchase; no subscription or hidden fees. VAT (20%) is included.
As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.