Capstok — learn by doing

Why this matters

The naive mental model is 'build a big string and send it'. The productive model is a pipeline: gather candidate context from many sources, score and select what fits the budget, order it, format it, and only then assemble the request. Thinking in pipeline stages means each decision becomes a testable, swappable component — you can improve retrieval without touching ordering, or change formatting without touching selection. Every serious context system is some version of this pipeline, even if the team never named it.

Demo

Here is the whole course in one function signature. assemble_context takes a query and a budget and runs the stages: gather → score → select-under-budget → order → format. The modules that follow are deep dives into each stage. Building the skeleton first means you always know where a new technique plugs in.

Loading animation…

Try it yourself

Trace a query through all four stages by hand with five toy blocks and a tiny budget. Confirm you can predict the output string.
Swap the greedy selector for 'take top-k regardless of size' and find an input where greedy-under-budget wins.
Replace the placeholder relevance() with a real embedding similarity (you'll do this properly in Module 4) and re-run.
Add a pin flag to Block so a must-include block (e.g. the system policy) is always selected first, before the budget loop.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Explain the idea of a 'context assembly pipeline' with stages gather → score → select → order → format. Why is this better than concatenating strings?

2. Why it works (the mechanism)

Walk through each stage of a context pipeline and what could go wrong at each one. Which stage is usually the highest-leverage to improve first?

3. Advanced — application & what's next

I want to A/B test context-assembly strategies. Given the pipeline (gather/score/select/order/format), which single stage should I make swappable first, and how would I measure that a change actually improved answer quality?

References

Chat about this lesson

from dataclasses import dataclass

@dataclass
class Block:
    id: str
    text: str
    tokens: int
    score: float = 0.0

def assemble_context(query: str, candidates: list[Block], budget: int) -> str:
    # 1) SCORE — relevance to the query (here: stub; real one uses embeddings/rerank)
    for b in candidates:
        b.score = relevance(query, b.text)
    # 2) SELECT under budget — greedy by score
    chosen, used = [], 0
    for b in sorted(candidates, key=lambda x: x.score, reverse=True):
        if used + b.tokens <= budget:
            chosen.append(b); used += b.tokens
    # 3) ORDER — most-relevant at the edges (mitigate lost-in-the-middle)
    chosen.sort(key=lambda x: x.score)          # ascending...
    ordered = chosen[1::2] + chosen[0::2][::-1]  # ...then fold strongest to the ends
    # 4) FORMAT — labeled, citeable
    return "\n\n".join(f"[{b.id}] {b.text}" for b in ordered)

def relevance(q, t):  # placeholder — Module 4 replaces this
    return float(len(set(q.lower().split()) & set(t.lower().split())))

Run: python3 main.py

Context as a pipeline, not a string