Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
If your film has a recurring character, every shot that features them is a bet: will this shot's generation match the previous shot's? Character consistency is the single largest reason AI-films break viewer immersion. The techniques — reference images, LoRA training, latent-space conditioning, inpainting — are all partial solutions. This task teaches you the toolkit, the quality ceiling of each technique, and the production-workflow patterns that actually ship consistent characters.
Four techniques stacked: (1) Character reference sheet — 12-24 stills of the character from different angles, identical wardrobe; feed as ref into every shot. (2) Trigger phrase — a unique token like 'LENA_v2' trained into a LoRA; summons the same identity. (3) Latent conditioning — some models accept a reference video; character matches across frames within one generation. (4) Face-swap post — Runway Act-One or open-source roop/InsightFace swap a reference face into generated footage. Real productions use all four in the same short.
# Character consistency workflow
Step 1: Build a reference sheet (ONE-TIME)
Generate 20–30 stills of LENA from different angles, lights, outfits.
Curate down to the 8 that feel most "her."
Save as lena_ref_v1/ folder.
Step 2: Train or tune (optional)
Option A: LoRA on a base video/image model using the 8 refs.
Option B: Use the model's built-in character-reference feature.
Step 3: Per-shot generation
ALWAYS include 3–5 reference images in the prompt.
Use the same trigger phrase in every prompt.
Keep lighting and wardrobe consistent across shots or the drift compounds.
Step 4: Post fixes
For shots where drift is unacceptable, do a face-swap pass
using a high-resolution reference still. Runway Act-One, InsightFace, roop.
Step 5: QA
Put all shots of LENA side-by-side. Rate consistency 1–5.
Any shot rating <3 must be regenerated or face-swapped.Use these three in order. Each builds on the one before.
In one paragraph, explain why character consistency is the hardest problem in AI video generation, and give me the four-technique stack (reference sheets, LoRAs, latent conditioning, face-swap).
Walk me through what's happening inside a diffusion video model when I supply a reference image — how is that image used in conditioning, why does it drift across shots, and which model architectures (U-Net vs DiT) handle this differently?
I'm making a 10-minute short with TWO recurring characters and multiple costume/lighting changes per character. Walk me through how to structure the consistency workflow: separate LoRAs per character, combined LoRAs, which face-swap tools handle edge cases (profile view, partial occlusion, extreme expressions), and when to give up and pick a different shot.