Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Frequency analysis is the oldest cryptanalytic technique we have — al-Kindi described it in 9th-century Baghdad, and it works against every cipher that preserves the structure of the underlying language. Its existence is the reason every cipher we still use is built to look statistically uniform: AES output should be indistinguishable from random bytes, ChaCha20 keystream should pass every randomness test we can throw at it. Understanding what frequency analysis can recover and what it cannot — it shreds simple substitution but reveals nothing if the ciphertext is genuinely uniform — is how you internalise what 'indistinguishable from random' actually means as a security goal. Every time you see a paper proving a scheme is 'IND-CPA secure,' the underlying intuition is: an attacker armed with frequency analysis (and far more) cannot do better than guessing.
English letter frequencies are stable across large samples: E ≈ 12.7%, T ≈ 9.1%, A ≈ 8.2%, and Z ≈ 0.07%. A monoalphabetic substitution cipher preserves these frequencies under a permutation, so the most common ciphertext letter is almost certainly E.
Use these three in order. Each builds on the one before.
In one paragraph, explain what frequency analysis is and why letter frequencies in English are stable enough that an attacker can use them as a fingerprint.
Walk me through how chi-squared analysis is used to score a candidate decryption against the English frequency distribution, and how that score lets you rank all 26 Caesar shifts to pick the most likely plaintext.
Given that frequency analysis trivially breaks any substitution cipher, what design property must a modern symmetric cipher like AES have so that frequency analysis on the ciphertext yields no information? Connect this to the formal goal of pseudorandomness.
// main.go
package main
import (
"fmt"
"strings"
"unicode"
)
var englishFreq = map[rune]float64{
'A': 8.167, 'B': 1.492, 'C': 2.782, 'D': 4.253, 'E': 12.702,
'F': 2.228, 'G': 2.015, 'H': 6.094, 'I': 6.966, 'J': 0.153,
'K': 0.772, 'L': 4.025, 'M': 2.406, 'N': 6.749, 'O': 7.507,
'P': 1.929, 'Q': 0.095, 'R': 5.987, 'S': 6.327, 'T': 9.056,
'U': 2.758, 'V': 0.978, 'W': 2.360, 'X': 0.150, 'Y': 1.974,
'Z': 0.074,
}
func chiSquared(text string) float64 {
upper := strings.Map(func(r rune) rune {
if unicode.IsLetter(r) {
return unicode.ToUpper(r)
}
return -1
}, text)
n := float64(len(upper))
counts := make(map[rune]int)
for _, c := range upper {
counts[c]++
}
chi2 := 0.0
for ch, expectedPct := range englishFreq {
expected := expectedPct * n / 100
observed := float64(counts[ch])
if expected > 0 {
diff := observed - expected
chi2 += diff * diff / expected
}
}
return chi2
}
// caesarShift decodes ciphertext by shifting each letter back by k positions
func caesarShift(s string, k int) string {
return strings.Map(func(r rune) rune {
if unicode.IsLetter(r) {
return rune((int(unicode.ToUpper(r))-65-k+26)%26 + 65)
}
return r
}, strings.ToUpper(s))
}
// Find best Caesar shift by chi-squared minimisation
func bestCaesarShift(ciphertext string) int {
bestShift := 0
bestScore := -1.0
for k := 0; k < 26; k++ {
score := chiSquared(caesarShift(ciphertext, k))
if bestScore < 0 || score < bestScore {
bestScore = score
bestShift = k
}
}
return bestShift
}
func main() {
ct := "WKLV LV D VHFUHW PHVVDJH"
fmt.Println("Best shift:", bestCaesarShift(ct)) // 3
}
go run main.go