Prompt Writing

How to Structure a Photo Prompt: The 7 Segments Every Strong Prompt Needs

Updated 2026-06-19·9 min read

Key takeaway

Structuring a photo prompt means dividing your description into seven distinct segments: subject identity, action or pose, environment and setting, lighting setup, camera and lens, mood and atmosphere, and post-processing style. Each segment fills a different perceptual channel that the model reads, so when all seven are present the model generates images that are consistent, intentional, and on-brand. This guide walks through every segment with worked examples and common ordering mistakes that cause even well-worded prompts to miss the mark.

AI Image AI Video Workflow Editor

The seven-segment model

A photo prompt is not a single idea — it’s a layered creative brief. The seven-segment model maps to the way a professional art director briefs a photographer: you don’t just say "shoot a product on white." You specify what the product is, how it’s positioned, where it lives, how it’s lit, what lens you’re using, what feeling you want to evoke, and how the final image should be retouched. AI models read prompts the same way. When any of the seven segments is missing, the model fills it in with a statistical average — which is why prompts without a camera and lens segment tend to produce the same generic "stock photo" depth of field every time. The seven segments are: (1) Subject and identity, (2) Action or pose, (3) Environment and setting, (4) Lighting setup, (5) Camera and lens, (6) Mood and atmosphere, (7) Post-processing and style. You don’t need every segment in every prompt — a minimalist product shot might skip environment entirely — but knowing all seven means you can deliberately choose what to omit rather than forgetting it.

Segment 1 & 2 — Subject and action

The subject segment names the person, object, or creature and describes its identifying characteristics: "a middle-aged Nigerian man with a full beard and close-cropped grey hair, wearing a deep navy suit, white pocket square." The action segment then describes what the subject is doing: "standing at the edge of a rooftop terrace, leaning against a railing, looking out over the city skyline, one hand in his jacket pocket." Together, these two segments establish the anchor of the image. Common mistakes: (a) Describing the subject’s emotion but not their physical appearance, leaving the model to guess the face; (b) Naming a generic action ("standing") without a context that shapes the pose — "standing" in a boardroom reads differently from "standing at a cliff edge." For reusable content series, write your subject segment as a fixed "character card" and slot it into multiple prompts. This is exactly what the Floniks /editor workflow supports: a character node whose output feeds into varied scene and lighting configurations downstream.

Segment 3 — Environment and setting

The environment segment describes the world the subject inhabits: location, time of day, weather, architectural context, and any background elements. "A rooftop terrace in a modern high-rise building in a dense city skyline, late evening, warm city lights beginning to illuminate the background, slight haze in the air, contemporary glass-and-steel architecture visible." Without a setting, models default to neutral studio or white-background environments. Setting also carries implicit lighting information — "an overcast beach at dawn" implies diffused, cool, low-key light without you having to state it explicitly, which means your explicit lighting segment in step 4 only needs to add nuance rather than define from scratch. For abstract or product shots where you want no distracting environment, specify it: "pure white infinity cove studio background, no environmental distractions" — this is still describing the environment, just intentionally minimalist.

Segment 4 & 5 — Lighting and camera

Lighting setup (segment 4) names the light sources, their quality, direction, and color temperature. Camera and lens (segment 5) names the focal length, aperture, and any camera model or film stock that carries aesthetic associations. These two segments are frequently merged because they’re closely related: the lens choice determines field of view, and the aperture determines depth of field which interacts with how light renders the background. A combined segment might read: "lit by a large soft box to camera-left at 45 degrees, secondary fill light at lower power from camera-right, warm 4800K, shot on a Sony A7R V with an 85mm f/1.4 lens wide open, background separated by shallow depth of field, soft circular bokeh." For product photography, "overhead flat lay lighting with white diffused panels on all sides, shot from directly above with a 50mm lens" gives the model an extremely specific geometry to recreate. See the lighting vocabulary article in this pillar for a copyable reference list of lighting terms.

Segment 6 & 7 — Mood and post-processing style

Mood and atmosphere (segment 6) communicates the emotional register of the image through adjectives and tonal references: "aspirational but approachable, quiet confidence, a sense of solitude that reads as strength rather than loneliness." You can also use cultural or cinematic references: "the mood of a late-night Terrence Malick film — pensive, slow, watching." Post-processing style (segment 7) specifies what the image looks like after it leaves the camera or render engine: "cinematic color grade with muted oranges, lifted blacks, desaturated mid-tones, subtle film grain, slight vignette." These final two segments are the difference between an image that looks technically correct and one that feels like a deliberate creative statement. On Floniks, you can pair these segments with a pro-effects pass in /pro-effects to apply consistent color grading across a batch, making every image in a series feel like it came from the same shoot.

Common ordering mistakes and how to fix them

The most common ordering mistake is leading with style and mood before establishing the subject: "cinematic, dramatic, moody, high-contrast, film noir aesthetic — a woman." The model reads the early tokens most heavily, so it might produce a technically excellent film noir image whose subject is a vague, generic woman because you buried her description at the end. Rule: always establish subject before atmosphere. The second most common mistake is interleaving technical and artistic language without separation — e.g., "f/1.8 beautiful emotional close-up 4K Rembrandt lighting heartfelt." This reads as a list of unrelated tags rather than a coherent brief, and the model averages them loosely. Keep technical parameters grouped together at the end. The third mistake is over-specifying style at the expense of subject detail: spending 40 words on the aesthetic and 5 words on who is in the image. The subject is the anchor; everything else is atmosphere. Invert the weighting and your results will improve immediately.

FAQ

Should I use commas or full sentences in a photo prompt?+

Both work, but full sentences or short comma-separated clauses tend to perform better than tag-lists. Tags like "beautiful, dramatic, epic" without grammatical context give the model very little directional signal — each tag could point in different directions. Structured clauses ("soft side lighting from a window, warm 4200K color temperature") tie the concepts together so the model reads them as a coherent instruction rather than competing adjectives.

Can I skip segments for a simpler prompt?+

Yes. A minimalist product prompt might only need subject, environment, and lighting. The seven-segment framework is a checklist, not a mandatory script. Use it to identify which blanks you are leaving to chance, then decide deliberately whether to fill them or leave them open.

How do I apply the seven-segment structure to AI video prompts?+

AI video prompts on Floniks /ai-video follow the same structure with one addition: a motion or action layer that describes camera movement and subject movement over time. Replace "action or pose" with "continuous action" and add a "camera motion" segment (slow dolly in, static shot, handheld follow). The rest of the segments — environment, lighting, mood, style — transfer directly.

Related guides

Build it on Floniks

Image, video, digital humans, and reusable workflows on one canvas. Sign up gets you starter credits — no card required.

Explore Floniks