Character consistency — the hardest problem

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

If your film has a recurring character, every shot that features them is a bet: will this shot's generation match the previous shot's? Character consistency is the single largest reason AI-films break viewer immersion. The techniques — reference images, LoRA training, latent-space conditioning, inpainting — are all partial solutions. This task teaches you the toolkit, the quality ceiling of each technique, and the production-workflow patterns that actually ship consistent characters.

Demo

Four techniques stacked: (1) Character reference sheet — 12-24 stills of the character from different angles, identical wardrobe; feed as ref into every shot. (2) Trigger phrase — a unique token like 'LENA_v2' trained into a LoRA; summons the same identity. (3) Latent conditioning — some models accept a reference video; character matches across frames within one generation. (4) Face-swap post — Runway Act-One or open-source roop/InsightFace swap a reference face into generated footage. Real productions use all four in the same short.

# Character consistency workflow
 
Step 1: Build a reference sheet (ONE-TIME)
  Generate 20–30 stills of LENA from different angles, lights, outfits.
  Curate down to the 8 that feel most "her."
  Save as lena_ref_v1/ folder.
 
Step 2: Train or tune (optional)
  Option A: LoRA on a base video/image model using the 8 refs.
  Option B: Use the model's built-in character-reference feature.
 
Step 3: Per-shot generation
  ALWAYS include 3–5 reference images in the prompt.
  Use the same trigger phrase in every prompt.
  Keep lighting and wardrobe consistent across shots or the drift compounds.
 
Step 4: Post fixes
  For shots where drift is unacceptable, do a face-swap pass
  using a high-resolution reference still. Runway Act-One, InsightFace, roop.
 
Step 5: QA
  Put all shots of LENA side-by-side. Rate consistency 1–5.
  Any shot rating <3 must be regenerated or face-swapped.

Try it yourself

Build a character reference sheet for your protagonist: 20 stills across lighting conditions. Reject the 12 that don't feel like the same person.

Generate the same character in five different shots using only reference images (no LoRA). Rate consistency on a 1-5 scale. Most will score 3 — that's normal.

Train a LoRA on the character (using Replicate's FLUX LoRA tooling, or Modal, or a local ComfyUI). Compare a shot generated WITH and WITHOUT the LoRA. LoRA usually wins.

Do a face-swap pass on a drifted shot. Use Runway Act-One or open-source. Observe when face-swap improves the shot and when it introduces artifacts of its own.

Decide on a character budget: how many hours of consistency work per character? Most 3-minute narrative shorts with one character cost 4-8 hours of dedicated consistency fiddling.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain why character consistency is the hardest problem in AI video generation, and give me the four-technique stack (reference sheets, LoRAs, latent conditioning, face-swap).

2. Why it works (the mechanism)

Walk me through what's happening inside a diffusion video model when I supply a reference image — how is that image used in conditioning, why does it drift across shots, and which model architectures (U-Net vs DiT) handle this differently?

3. Advanced — application & what's next

I'm making a 10-minute short with TWO recurring characters and multiple costume/lighting changes per character. Walk me through how to structure the consistency workflow: separate LoRAs per character, combined LoRAs, which face-swap tools handle edge cases (profile view, partial occlusion, extreme expressions), and when to give up and pick a different shot.