Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
A good video prompt is five decisions, not a sentence. You name the subject, the setting, the style, the camera, and the lighting. If you leave any of those blank, the model fills it in — and its defaults are probably not what you wanted. Learning to name all five up front is the difference between 'rolling dice' and 'directing'.
Structured prompts outperform thin ones because they supply the five decisions a model needs to stop guessing: subject, setting, style, camera, and lighting. Without those labels, the model fills blanks from its training distribution — rarely the aesthetic you had in mind. The pair below shows how each added label narrows the output space toward what you actually want.
BAD:
A cat in a kitchen.
GOOD:
Subject: a ginger cat, tail flicking
Setting: a sunlit 1970s kitchen, dust in the air
Style: 35mm film, warm grain, slight halation
Camera: eye-level medium shot, shallow depth of field
Lighting: late-afternoon side-light from a single window
Motion: the cat turns its head toward the camera once, slowlyUse these three in order. Each builds on the one before.
Explain in one short paragraph what each part of a structured video prompt does: Subject, Setting, Style, Camera, Lighting, Motion. Use an example where skipping one of them would ruin the shot.
Video generation models are trained on captioned clips. Walk me through why a prompt that names Style and Camera produces a more consistent result than a prompt that only names Subject and Setting. What exactly is the model 'looking up' when I write '35mm film, shallow depth of field'?
I want the same character and setting across three shots but different camera angles and moods. Give me three prompts that share Subject+Setting+Style and vary only Camera+Lighting+Motion — and predict what will break first if I try to push it to ten shots.