Similarity Search Patterns

Implement efficient similarity search with vector databases.

Production-ready blueprints for building semantic and vector search that actually scales. It ships working implementations for four vector stores (Pinecone, Qdrant, pgvector, and Weaviate) plus the decision frameworks for index choice, distance metric, and filtering strategy that separate a fast retrieval system from a slow, low-recall one. Stop guessing at HNSW parameters and ship search that returns the right results in under 200ms.

$15 one-time
Add to a kit →

Prices include 20% VAT. · Forged on real agency work · one-time, no lock-in

  • Type Skill
  • Category Data & Analytics
  • Delivery Email · instant
  • License One-time
Run preview
forgehouse, similarity-search-patterns

Inside the run · no black box

See the actual work before you buy it.

The decision and build sequence the skill walks through when standing up production similarity search, in order:

  1. Size the dataset first and let it pick the index family: under 10K vectors flat exact search, 10K to 1M HNSW, beyond that HNSW with quantization or IVF+PQ. Every tier trades a little recall for an order of magnitude in speed.
  2. Match the distance metric to the embedding model: cosine for L2-normalized outputs, dot product when magnitude carries meaning. Vectors from different models never share one space; switching models means re-embedding everything.
  3. Decide pre-filter vs post-filter for metadata: pre-filter with payload indexes on hot fields like tenant_id and category, because post-filtering 100 results down to 3 starves the user of answers.
  4. Load vectors in batches of 100 to 1000 per upsert call for 10-50x throughput; on large pgvector imports, drop the index first and rebuild after the load, which runs 5 to 10x faster.
  5. Wire hybrid search where keywords still matter: blend the dense vector score with BM25 or full-text rank at a tuned weight. The skill carries ready templates for Pinecone, Qdrant, pgvector and Weaviate, plus cross-encoder reranking on the over-fetched top 50.
  6. Calibrate the score threshold against ground-truth pairs instead of hardcoding a magic 0.85, then track recall weekly as a live metric to catch embedding drift early.
Use cases · what happens when you plug it in

One power source. 6 lines out.

similarity-search-patterns · core

core active · 6 lines

  1. Building semantic search over millions of documents

    ✓ building semantic search
  2. Powering RAG retrieval for AI assistants

    ✓ powering rag retrieval for
  3. Implementing recommendation engines

    ✓ implementing recommendat…
  4. Combining vector and keyword (hybrid) search

    ✓ combining vector and key…
  5. Migrating from flat search to HNSW or IVF+PQ

    ✓ migrating from flat search
  6. Reranking candidate results with a cross-encoder

    ✓ reranking candidate resu…
Benefits · what you walk away with

Yours to keep.

Drag time forward. Watch what stays.

Forever

That's what owning means.

The rented stack

ai writing tool: subscription

expired · access lost

analytics suite: subscription

expired · access lost

design platform: subscription

expired · access lost

(nothing left)

Your forge

  1. Choose the right index for your data size instead of over-engineering small datasets

    license: perpetual
  2. Hit sub-200ms search latency with tuned recall/speed tradeoffs

    license: perpetual
  3. Avoid the post-filter trap that silently returns too few results

    license: perpetual
  4. Cut vector-store costs by matching index type and quantization to scale

    license: perpetual

subscriptions expire · deeds don't

What's included · the full manifest

Everything in the box.

Pick a piece up. Watch it work.

Copy-paste vector store classes for Pinecone, Qdrant, pgvector, and Weaviate

part 01 of 06 · in the box

6 parts · one working system · ships instantly by email

Who it's for

This wasn't forged for everyone.

  • Not for you if you'd rather rent a tool than own one.
  • Not for you if you want someone else to run your stack.
  • Not for you if you're happy guessing.
Still here? Good.

Backend and AI engineers building semantic search, RAG retrieval, or recommendation systems that need to stay fast and accurate at scale.

then this was forged for you.

Works with

Universal by design: these run in any AI. Delivered in the open Agent Skills + MCP format (native in Claude); ChatGPT, Gemini, Cursor and Copilot adapt the same files their own way.

  • Claude Native format
  • ChatGPT Adapts via open standards
  • Gemini Adapts via open standards
  • Cursor Adapts via open standards
  • Copilot Adapts via open standards
Questions · still in the air

Catch what's on your mind.

the air is clear. nothing between you and the forge.
catch a spark: the forge will answer

  1. We only have about 50K vectors, is this overkill for us?

    Part of the value is being told not to over-engineer: the index selection guide maps data size to index type, and at 50K vectors a flat index or simple HNSW is usually the right call. The blueprints then scale with you as the dataset grows.

  2. How does it actually get search under 200ms?

    Working store classes for Pinecone, Qdrant, pgvector, and Weaviate, plus the decisions that dominate latency: HNSW parameters, pre-filter versus post-filter strategy with payload indexes, quantization, and score-threshold calibration. The post-filter trap that silently returns too few results is called out specifically.

  3. Does it include an embedding model or host the vector database?

    No. It assumes you bring your own embeddings and a vector store; what it provides are the implementation patterns and decision frameworks on top. Model choice and hosting costs remain yours.

  4. How is it delivered?

    By email right after purchase: ready to run, downloaded instantly, no setup wait.

  5. One-time or subscription?

    A one-time purchase; no subscription or hidden fees. VAT (20%) is included.

  6. Can I get a refund?

    As a digital product, it can’t be refunded once downloaded. That’s why we show exactly what’s inside and who it’s for, right here.