Capstok — learn by doing

Why this matters

Every honest conversation about HE deployment has to start with the cost numbers, because they shape every other design decision. Plaintext arithmetic in 2024 is roughly $10^{10}$ basic operations per second per CPU core. HE multiplication on a fresh ciphertext, even in a state-of-the-art library, is roughly $10^4$ – $10^5$ basic plaintext-equivalent operations per second — that is, somewhere between five and six orders of magnitude slower. Bootstrapping costs another order or two on top of that. Compared to garbled-circuit MPC, HE is more compact (no $O(\text{circuit size})$ communication) but slower per gate; compared to secret-sharing MPC, HE shifts cost from network rounds to local CPU. The right tool depends on whether your bottleneck is bandwidth, latency, or compute. Without these numbers in your head, you'll either dismiss HE as 'too slow' for problems where it's actually fine, or oversell it for problems where MPC would have shipped years ago.

Demo

We compare three numbers across three primitives: throughput per core (ops/sec), ciphertext expansion (bytes ciphertext / bytes plaintext), and round complexity (network round trips per multiplication). Plaintext is the baseline. HE has zero round complexity (single-shot computation) but huge ciphertext expansion and slow ops. MPC has small ciphertext but many rounds. The 2024-era ballpark is below.

\begin{array}{l|lll} & \text{ops/sec/core} & \text{cipher expansion} & \text{rounds/mult} \\\hline \text{plaintext} & 10^{10} & 1\times & 0 \\ \text{HE (CKKS)} & 10^4{-}10^5 & 10^3{-}10^5\times & 0 \\ \text{GMW MPC} & 10^6{-}10^7 & {\sim}1\times & O(\mathrm{depth}) \\ \text{Garbled circuits} & 10^6 & {\sim}\text{circuit size} & 1 \\ \end{array}

Try it yourself

Run the snippet and read off the toy-HE-vs-plaintext slowdown. Multiply the result by another factor of $\sim$ 10000 to estimate real-world HE-vs-plaintext at production parameters. Sanity-check this against the SEAL benchmark numbers.
Look up the latency of a single CKKS multiplication at ring dimension $n=2^{14}$ in Microsoft SEAL or OpenFHE. (It's tens of milliseconds.) How many multiplications can you do per core per second? Compare to the $10^{10}$ plaintext baseline.
Compute, on paper, the ciphertext expansion ratio for an LWE-style ciphertext encrypting a single 32-bit integer with $n = 2^{14}$ and bits. (Ciphertext is two ring elements; each ring element is bits.) Why is this number so frightening?

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain why HE is roughly 'a million times slower than plaintext' but can still be the right answer for some applications. What property does it offer that compensates?

2. Why it works (the mechanism)

Walk me through where the cost actually goes in a single CKKS multiplication: NTTs, modulus switching, key-switching, relinearization. Which of these dominate the runtime at production parameters?

3. Advanced — application & what's next

Compare HE and secret-sharing-MPC for a depth-100 circuit between two parties on a 100ms-RTT link. Build the rough cost model and find the crossover circuit size where HE beats MPC and vice versa.

References

n=2^{14}

// main.go — run: go run main.go
package main

import (
	"fmt"
	"math/rand"
	"time"
)

const (
	nOps   = 100_000
	nHeOps = 1_000
	mask60 = (1 << 60) - 1
	mask30 = (1 << 30) - 1
)

func main() {
	r := rand.New(rand.NewSource(time.Now().UnixNano()))

	// Plaintext baseline: integer mult
	start := time.Now()
	var c int64
	for i := 0; i < nOps; i++ {
		a := int64(r.Int63() & mask30)
		b := int64(r.Int63() & mask30)
		c = a * b
	}
	_ = c
	plainDt := time.Since(start).Seconds()
	fmt.Printf("plaintext mult x%d: %.1f ms  ->  %.2e ops/sec\n",
		nOps, plainDt*1000, float64(nOps)/plainDt)

	// Toy HE-style mult: ciphertext as length-2 polynomial, schoolbook polymult,
	// then reduce mod degree-2 ring polynomial. Mimics one CKKS-style mult
	// at ring dim 2 (real schemes use n=2^14).
	start = time.Now()
	var r0, r1 int64
	for i := 0; i < nHeOps; i++ {
		a0 := int64(r.Int63() & mask60)
		a1 := int64(r.Int63() & mask60)
		b0 := int64(r.Int63() & mask60)
		b1 := int64(r.Int63() & mask60)
		// poly mult
		c0 := a0 * b0
		c1 := a0*b1 + a1*b0
		c2 := a1 * b1
		// reduce mod x^2 + 1
		r0 = (c0 - c2) & mask60
		r1 = c1 & mask60
	}
	_ = r0
	_ = r1
	heDt := time.Since(start).Seconds()
	fmt.Printf("toy HE   mult x%d: %.1f ms  ->  %.2e ops/sec\n",
		nHeOps, heDt*1000, float64(nHeOps)/heDt)

	slowdown := (heDt / nHeOps) / (plainDt / nOps)
	fmt.Printf("slowdown (toy ring-dim 2 vs plaintext): ~%.1fx\n", slowdown)
	fmt.Println("real HE at ring-dim 16384 is ~10000x slower again on top of this.")
}

Run: go run main.go

Performance characteristics: HE vs. plaintext vs. MPC