Software Development

RAG Pipeline Builder

Generates a complete RAG pipeline with chunking strategy, embedding model selection, vector store setup, hybrid retrieval, and a quality evaluation harness. Useful for standing up production RAG without expert-level tuning. Engineers shipping LLM-backed document QA, search, or summarization features; AI engineers replacing naive semantic search with production RAG; founders building knowledge-base chatbots on proprietary content. most importantly — how to measure retrieval quality. Naive RAG implementations hallucinate confidently, retrieve irrelevant passages, or miss the right document entirely. A structured builder produces a pipeline with sensible defaults for the corpus plus an evaluation harness so regressions are visible.

Nexus CertifiedClaude CodeCodexOpenClawGoogle Antigravity

ragllmretrievalembeddingsai-engineering

One-Time Purchase

$19.99

Sample Output

RAG Pipeline — Internal Docs QA at Northbeam Analytics

Corpus: ~8,000 internal docs (markdown + PDF), avg 1,200 tokens each, ~10M tokens total. Query pattern: factual ("What's our international travel policy?"), lookup ("Who owns the auth service?"), and light synthesis ("Summarize recent on-call rotation changes"). Stack: TypeScript, Vercel AI SDK, pgvector in existing Postgres, Claude Sonnet for generation. Multi-tenant: No.

Summary

Headline

Recommended pipeline: token-aware chunking at 512 tokens with 64-token overlap, text-embedding-3-small for embeddings, hybrid retrieval (pgvector cosine + Postgres BM25 via tsvector), cross-encoder reranking with bge-reranker-base, and Claude Sonnet for generation with citation enforcement. The non-obvious choice is the reranker — it adds ~140ms but lifts recall@5 by 11 points on the eval set, which matters more than latency for an internal-docs use case.

Pipeline Stages

Stage budget per query (p50)

1. Query rewriting (skipped for short queries)~12ms

2. Embedding (text-embedding-3-small, 1536-dim)~45ms

3. Hybrid retrieval (pgvector + BM25, top-40)~28ms

4. Cross-encoder rerank (bge-reranker-base, top-5)~140ms

5. Generation (Claude Sonnet, streamed)~1.6s

Stage 1 — Chunking

import { encode } from 'gpt-tokenizer'

export function chunkDocument(text: string, opts = { size: 512, overlap: 64 }) {
  const tokens = encode(text)
  const chunks: { tokens: number[]; start: number; end: number }[] = []
  let i = 0
  while (i < tokens.length) {
    const end = Math.min(i + opts.size, tokens.length)
    chunks.push({ tokens: tokens.slice(i, end), start: i, end })
    if (end === tokens.length) break
    i += opts.size - opts.overlap
  }
  return chunks
}

Why 512 / 64

512 tokens fits roughly one logical section of an internal doc. 64-token overlap preserves continuity across section boundaries without ballooning storage. Larger chunks (1024+) hurt retrieval precision; smaller chunks (256) fragment policy text and miss the answer. The eval set confirmed 512 as the sweet spot at recall@5 — bigger or smaller chunks both lost 4–7 points.

Stage 2 — Embedding

import OpenAI from 'openai'
const openai = new OpenAI()

export async function embed(texts: string[]) {
  const res = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
    dimensions: 1536,
  })
  return res.data.map((d) => d.embedding)
}

Why text-embedding-3-small

At this corpus size, text-embedding-3-small matches -large on the eval within margin while costing 6.5x less and embedding 2x faster. Re-evaluate if the corpus grows past ~50M tokens or if domain-specific terminology starts hurting recall.

Stage 3 — Hybrid Retrieval

Vector search alone misses keyword-anchored questions like "What is the SAML-SSO setup URL?" — the literal token "SAML-SSO" is what matters, and embeddings smooth over it. BM25 alone misses paraphrased questions. Reciprocal Rank Fusion (RRF) on the two ranked lists is the simplest combiner that works.

Stage 4 — Reranking

import { CrossEncoder } from '@xenova/transformers'
const reranker = await CrossEncoder.from_pretrained('BAAI/bge-reranker-base')

export async function rerank(query: string, candidates: Chunk[], topK = 5) {
  const scores = await reranker.predict(candidates.map((c) => [query, c.text]))
  return candidates
    .map((c, i) => ({ ...c, score: scores[i] }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK)
}

Stage 5 — Generation

Claude Sonnet with a system prompt that enforces grounded answers and inline citations. The retrieval result is rendered as <doc id="123" path="...">...</doc> blocks, and the prompt requires every claim to cite at least one doc id.

Eval Metrics — 200-question Holdout

Metric	Naive (vector-only)	Hybrid	Hybrid + Rerank	Status
Recall@5	0.68	0.74	0.85	Ship
MRR	0.51	0.59	0.71	Ship
Answer faithfulness (human-rated 1–5)	3.4	3.9	4.4	Ship
Citation precision	0.62	0.78	0.91	Ship
p50 end-to-end latency	1.55s	1.62s	1.83s	Watch
Hallucination rate	12%	6%	2%	Ship

Tradeoffs and Open Calls

Chunk size is corpus-specific

The 512-token chunk size was tuned on the eval set. If new doc types arrive (long-form RFCs, transcripts), revisit chunking. Long-form transcripts in particular usually need 1024-token chunks with smarter boundary detection (sentence-aware splitting).

Top-k tradeoff

Top-40 candidates into the reranker is the latency-optimal point on this corpus. Going to top-100 lifts recall by ~2pp but adds ~250ms. For an internal tool that's a bad trade; for a customer-facing search box where missed answers are visible, it might be worth it.

Reranker choice

bge-reranker-base runs on CPU at this scale and is the best precision-per-dollar option. bge-reranker-large is 3pp better on recall@5 but needs a GPU and roughly triples per-query cost. Worth revisiting only if the corpus or query volume grows materially.

Citation enforcement is load-bearing

The hallucination rate dropped from 12% to 2% mostly because the prompt now requires inline doc-id citations and a follow-up validator strips answers that fail citation parsing. Do not ship without the validator — without it, the generation step regresses to ~7% hallucination even with the reranker in place.

This sample illustrates the skill's output format. Northbeam Analytics is a fictional company used recurringly across these sample outputs. Real corpus statistics and eval results are never included in sample outputs.

View full sample →

I agree to the Terms, Privacy Policy, Acceptable Use Policy, and AI Disclosure, and I confirm I am at least 18 years old.

All sales final. No refunds on digital products.

Includes support for Claude Code, Codex, OpenClaw, and Google Antigravity in the same license.

Also in AI Engineering

Bundle price: $55. Compare this skill with the full workflow bundle or Pro access.

View Bundle Compare Pro

Best for

Engineering teams shipping LLM-backed document-QA, search, or summarization features who are past prototype but haven't owned a retrieval system before, AI engineers replacing naive top-k cosine search with hybrid retrieval and a real eval harness, and founders building knowledge-base chatbots on proprietary content where retrieval quality directly determines the product's perceived intelligence. Most valuable when the corpus is well-defined (a docs site, a support knowledge base, a contract repository — not "the open web") and the team needs sensible defaults plus a measurement loop rather than handcrafted retrieval research.

Not ideal for

Production RAG systems that are already running at scale and need targeted optimization (re-ranking model selection, query routing, multi-vector strategies) — the builder produces a sensible starting pipeline, not advanced tuning for a system that already works. Also a poor fit when the underlying problem is actually a search problem (structured data, exact-match queries) rather than a retrieval-augmented generation problem; a traditional search stack will outperform RAG on those workloads.

Included in this purchase

Claude Code, Codex, OpenClaw, and Google Antigravity skill files.
Setup guidance for the right adapter in your workspace.
One-time license for the purchased skill version.

Setup

Plan for a short setup in the repository or workspace where the skill will run. Some coding familiarity helps for implementation-heavy outputs.

Claude CodeCodexOpenClawGoogle Antigravity

Related Skills

Code Generation & Review

Featured

Code Generation

Generates, reviews, debugs, and executes code in sandboxed workflows. Useful for implementation, refactoring, and technical problem solving.

Claude CodeCodexOpenClawGoogle Antigravity

codingdebuggingcode-review

$19.99

One-time license

View Skill

Product Documentation & Onboarding

API Documentation Generator

Generates structured, developer-ready API documentation from code, OpenAPI specs, route definitions, or descriptions. Produces reference docs, quickstart guides, error references, and code examples.

Claude CodeCodexOpenClawGoogle Antigravity

apidocumentationdeveloper-experience

$19.99

One-time license

View Skill

Code Generation & Review

Intelligent PR Composer

Generates pull request descriptions that capture context, alternatives considered, test plan, risk areas, and reviewer guidance beyond a simple diff summary. Useful for teams that want senior-quality PRs without manual authoring.

Claude CodeCodexOpenClawGoogle Antigravity

pull-requestscode-reviewgit

$19.99

One-time license

View Skill

Future Updates

This purchase includes the current version of the skill. If you want future adapter updates — meaning compatibility and packaging updates as supported platforms evolve — plus new catalog additions included automatically, upgrade to Pro.

Upgrade to Pro