Skip to main content

Software Development

RAG Pipeline Builder

Generates a complete RAG pipeline with chunking strategy, embedding model selection, vector store setup, hybrid retrieval, and a quality evaluation harness. Useful for standing up production RAG without expert-level tuning. Engineers shipping LLM-backed document QA, search, or summarization features; AI engineers replacing naive semantic search with production RAG; founders building knowledge-base chatbots on proprietary content. most importantly — how to measure retrieval quality. Naive RAG implementations hallucinate confidently, retrieve irrelevant passages, or miss the right document entirely. A structured builder produces a pipeline with sensible defaults for the corpus plus an evaluation harness so regressions are visible.

Nexus CertifiedClaude CodeCodexOpenClawGoogle Antigravity
ragllmretrievalembeddingsai-engineering

One-Time Purchase

$19.99

Sample Output

RAG Pipeline — Internal Docs QA at Northbeam Analytics

Corpus: ~8,000 internal docs (markdown + PDF), avg 1,200 tokens each, ~10M tokens total. Query pattern: factual ("What's our international travel policy?"), lookup ("Who owns the auth service?"), and light synthesis ("Summarize recent on-call rotation changes"). Stack: TypeScript, Vercel AI SDK, pgvector in existing Postgres, Claude Sonnet for generation. Multi-tenant: No.


Summary

Headline

Recommended pipeline: token-aware chunking at 512 tokens with 64-token overlap, text-embedding-3-small for embeddings, hybrid retrieval (pgvector cosine + Postgres BM25 via tsvector), cross-encoder reranking with bge-reranker-base, and Claude Sonnet for generation with citation enforcement. The non-obvious choice is the reranker — it adds ~140ms but lifts recall@5 by 11 points on the eval set, which matters more than latency for an internal-docs use case.


Pipeline Stages

Stage budget per query (p50)

1. Query rewriting (skipped for short queries)~12ms
2. Embedding (text-embedding-3-small, 1536-dim)~45ms
3. Hybrid retrieval (pgvector + BM25, top-40)~28ms
4. Cross-encoder rerank (bge-reranker-base, top-5)~140ms
5. Generation (Claude Sonnet, streamed)~1.6s

Stage 1 — Chunking

import { encode } from 'gpt-tokenizer'

export function chunkDocument(text: string, opts = { size: 512, overlap: 64 }) {
  const tokens = encode(text)
  const chunks: { tokens: number[]; start: number; end: number }[] = []
  let i = 0
  while (i < tokens.length) {
    const end = Math.min(i + opts.size, tokens.length)
    chunks.push({ tokens: tokens.slice(i, end), start: i, end })
    if (end === tokens.length) break
    i += opts.size - opts.overlap
  }
  return chunks
}

Why 512 / 64

512 tokens fits roughly one logical section of an internal doc. 64-token overlap preserves continuity across section boundaries without ballooning storage. Larger chunks (1024+) hurt retrieval precision; smaller chunks (256) fragment policy text and miss the answer. The eval set confirmed 512 as the sweet spot at recall@5 — bigger or smaller chunks both lost 4–7 points.


Stage 2 — Embedding

import OpenAI from 'openai'
const openai = new OpenAI()

export async function embed(texts: string[]) {
  const res = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
    dimensions: 1536,
  })
  return res.data.map((d) => d.embedding)
}

Why text-embedding-3-small

At this corpus size, text-embedding-3-small matches -large on the eval within margin while costing 6.5x less and embedding 2x faster. Re-evaluate if the corpus grows past ~50M tokens or if domain-specific terminology starts hurting recall.


Stage 3 — Hybrid Retrieval

Vector search alone misses keyword-anchored questions like "What is the SAML-SSO setup URL?" — the literal token "SAML-SSO" is what matters, and embeddings smooth over it. BM25 alone misses paraphrased questions. Reciprocal Rank Fusion (RRF) on the two ranked lists is the simplest combiner that works.


Stage 4 — Reranking

import { CrossEncoder } from '@xenova/transformers'
const reranker = await CrossEncoder.from_pretrained('BAAI/bge-reranker-base')

export async function rerank(query: string, candidates: Chunk[], topK = 5) {
  const scores = await reranker.predict(candidates.map((c) => [query, c.text]))
  return candidates
    .map((c, i) => ({ ...c, score: scores[i] }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
}

Stage 5 — Generation

Claude Sonnet with a system prompt that enforces grounded answers and inline citations. The retrieval result is rendered as <doc id="123" path="...">...</doc> blocks, and the prompt requires every claim to cite at least one doc id.


Eval Metrics — 200-question Holdout

MetricNaive (vector-only)HybridHybrid + RerankStatus
Recall@50.680.740.85Ship
MRR0.510.590.71Ship
Answer faithfulness (human-rated 1–5)3.43.94.4Ship
Citation precision0.620.780.91Ship
p50 end-to-end latency1.55s1.62s1.83sWatch
Hallucination rate12%6%2%Ship

Tradeoffs and Open Calls

Chunk size is corpus-specific

The 512-token chunk size was tuned on the eval set. If new doc types arrive (long-form RFCs, transcripts), revisit chunking. Long-form transcripts in particular usually need 1024-token chunks with smarter boundary detection (sentence-aware splitting).

Top-k tradeoff

Top-40 candidates into the reranker is the latency-optimal point on this corpus. Going to top-100 lifts recall by ~2pp but adds ~250ms. For an internal tool that's a bad trade; for a customer-facing search box where missed answers are visible, it might be worth it.

Reranker choice

bge-reranker-base runs on CPU at this scale and is the best precision-per-dollar option. bge-reranker-large is 3pp better on recall@5 but needs a GPU and roughly triples per-query cost. Worth revisiting only if the corpus or query volume grows materially.

Citation enforcement is load-bearing

The hallucination rate dropped from 12% to 2% mostly because the prompt now requires inline doc-id citations and a follow-up validator strips answers that fail citation parsing. Do not ship without the validator — without it, the generation step regresses to ~7% hallucination even with the reranker in place.


This sample illustrates the skill's output format. Northbeam Analytics is a fictional company used recurringly across these sample outputs. Real corpus statistics and eval results are never included in sample outputs.

View full sample →

All sales final. No refunds on digital products.

Includes support for Claude Code, Codex, OpenClaw, and Google Antigravity in the same license.

Also in AI Engineering

Bundle price: $55. Compare this skill with the full workflow bundle or Pro access.

Best for

Engineering teams shipping LLM-backed document-QA, search, or summarization features who are past prototype but haven't owned a retrieval system before, AI engineers replacing naive top-k cosine search with hybrid retrieval and a real eval harness, and founders building knowledge-base chatbots on proprietary content where retrieval quality directly determines the product's perceived intelligence. Most valuable when the corpus is well-defined (a docs site, a support knowledge base, a contract repository — not "the open web") and the team needs sensible defaults plus a measurement loop rather than handcrafted retrieval research.

Not ideal for

Production RAG systems that are already running at scale and need targeted optimization (re-ranking model selection, query routing, multi-vector strategies) — the builder produces a sensible starting pipeline, not advanced tuning for a system that already works. Also a poor fit when the underlying problem is actually a search problem (structured data, exact-match queries) rather than a retrieval-augmented generation problem; a traditional search stack will outperform RAG on those workloads.

Included in this purchase

  • Claude Code, Codex, OpenClaw, and Google Antigravity skill files.
  • Setup guidance for the right adapter in your workspace.
  • One-time license for the purchased skill version.

Setup

Plan for a short setup in the repository or workspace where the skill will run. Some coding familiarity helps for implementation-heavy outputs.

Claude CodeCodexOpenClawGoogle Antigravity

Related Skills

Code Generation & Review
Featured
Code Generation
Generates, reviews, debugs, and executes code in sandboxed workflows. Useful for implementation, refactoring, and technical problem solving.
Claude CodeCodexOpenClawGoogle Antigravity
codingdebuggingcode-review

$19.99

One-time license

View Skill
Product Documentation & Onboarding
API Documentation Generator
Generates structured, developer-ready API documentation from code, OpenAPI specs, route definitions, or descriptions. Produces reference docs, quickstart guides, error references, and code examples.
Claude CodeCodexOpenClawGoogle Antigravity
apidocumentationdeveloper-experience

$19.99

One-time license

View Skill
Code Generation & Review
Intelligent PR Composer
Generates pull request descriptions that capture context, alternatives considered, test plan, risk areas, and reviewer guidance beyond a simple diff summary. Useful for teams that want senior-quality PRs without manual authoring.
Claude CodeCodexOpenClawGoogle Antigravity
pull-requestscode-reviewgit

$19.99

One-time license

View Skill

Future Updates

This purchase includes the current version of the skill. If you want future adapter updates — meaning compatibility and packaging updates as supported platforms evolve — plus new catalog additions included automatically, upgrade to Pro.

Upgrade to Pro