Prompt Writing

Describing Subject and Scene: Specificity Models Actually Follow

Updated 2026-06-19·8 min read

Key takeaway

The subject and scene description is the foundation every other prompt layer builds on. Models respond to specificity in predictable ways: concrete nouns outperform adjectives, physical attributes outperform emotional ones, and spatial relationships outperform vague proximity words. This guide teaches you which kinds of details actually change model output — distinguishing signal from decoration — and gives you reusable description templates for people, products, animals, and architectural scenes that work consistently on Floniks AI Image.

AI Image Workflow Editor

Concrete nouns over adjectives

The single most reliable upgrade to a subject description is replacing adjectives with concrete nouns. "A beautiful woman" tells the model almost nothing — beauty is subjective and the model has millions of conflicting examples of it. "A 28-year-old Colombian woman with high cheekbones, dark brown almond-shaped eyes, a small gap between her front teeth, and waist-length curly black hair" gives the model a specific physical blueprint. The model may not reproduce every detail perfectly, but each concrete term narrows the solution space dramatically. This principle applies to everything: "a luxury watch" is weak; "a 42mm brushed titanium mechanical watch with a navy blue sunburst dial, applied gold indices, and a beige leather strap with white stitching" is strong. The test: could ten different photographers produce ten very different images from your subject description? If yes, it needs more concrete nouns. If they’d all end up shooting something recognizably similar, you’re specific enough. For ongoing character series on Floniks, paste your concrete subject description into a fixed "character card" node in /editor and reuse it across every scene variation.

Describing people: the five physical dimensions

When describing a human subject, cover five dimensions: (1) Age and demographic — "a 50-year-old East African man"; (2) Facial features — "square jaw, short salt-and-pepper beard, deep-set dark brown eyes, prominent forehead, wide flat nose"; (3) Hair — "close-cropped natural coils, hairline receding slightly at the temples"; (4) Clothing and styling — "wearing a tailored charcoal suit with a pale blue shirt, no tie, collar open, silver watch on left wrist"; (5) Pose and physical expression — "standing with arms crossed loosely, slight forward lean, relaxed but alert." You don’t need all five in every prompt. For headshots, (1)–(3) and (5) are essential. For full-body fashion, (1)–(4) matter most. For crowd scenes, a lighter touch works because individuals are smaller in the frame. The most commonly omitted dimension is (2) facial features — creators describe hair and clothing in detail but leave the face generic, which is why faces often look like stock-photo averages. Describe the face first. It anchors everything else.

Describing products: material, geometry, and surface

Product descriptions live or die on three dimensions: material, geometry, and surface finish. Material tells the model what light does to the object: "brushed stainless steel" catches directional light as long parallel streaks; "matte ceramic" scatters it evenly. Geometry describes the shape: "a squat, wide-bellied bottle with a long narrow neck and a flat cap." Surface finish describes micro-level texture: "hammered texture on the lower half, smooth polished on the upper, subtle orange-peel texture on the cap." A full product prompt might read: "a 300ml hand soap dispenser made of recycled frosted glass with a matte brushed-brass pump, hexagonal cross-section, flat bottom, slight taper toward the neck." Placement is a fourth dimension that’s easy to overlook: specify where the product sits in the frame ("centered on a white marble surface, label facing directly toward camera at zero degree angle") so you control what the viewer sees. For e-commerce catalog work on Floniks, run a product-catalog workflow where the product description is a fixed node input and only the background and lighting vary per shot.

Describing scenes: location, time, and atmosphere layers

A scene description combines three sub-layers: location type, time and weather, and atmospheric details. Location type: "a narrow cobblestone alley in an old European city, buildings four stories tall on both sides, a small cafe with outdoor seating visible in the mid-ground." Time and weather: "early autumn morning, overcast sky, ambient light flat and cool, a few fallen leaves on the wet cobblestones." Atmospheric details: "steam rising from a street drain in the background, a light mist softening distant details, one window in the upper-right building lit warm amber." Together, these three sub-layers build a scene with depth and specificity. The most powerful technique: use foreground, mid-ground, and background as an organizational framework. What’s in the foreground within arm’s reach? What’s in the mid-ground 10–30 feet away? What’s in the background beyond that? This spatial layering gives the model a genuine sense of depth rather than a flat backdrop behind a subject. Spatial relationship words that models reliably follow: "directly in front of," "slightly left of center," "in the right background," "partially obscured by," "reflected in."

What models don't reliably follow

Knowing what doesn’t work saves you from decorating prompts with words that don’t move the needle. Models handle these poorly: (1) Abstract emotional states without physical anchors — "a feeling of nostalgia" produces nothing specific; "faded, slightly overexposed Kodachrome colors, a dusty childhood bedroom with curtains blowing in the breeze" produces nostalgia. Always anchor emotions to physical, visual details. (2) Negatives in the subject description — "a woman who is not tall, not blonde, not smiling" is nearly useless because models don’t parse negation well in positive descriptions. Put exclusions in the negative prompt field instead. (3) Relative size without reference — "a big dog" next to what? "A German Shepherd standing at hip height to a seated adult" is parseable. (4) Fictional property names — "a Zylvaran warrior" means nothing to a model; "a warrior in black segmented plate armor with red geometric inlay patterns, carrying a long curved single-edged blade" translates the concept into visual language. (5) Stacked adjectives without nouns — "beautiful, stunning, gorgeous, magnificent" add no visual information. Every adjective should modify a specific noun.

Templates you can copy and remix

Person — headshot: "[Age] [ethnicity/demographic] [gender] with [facial feature 1], [facial feature 2], [facial feature 3], [hair description]. Wearing [clothing details]. [Pose and expression]. [Eye contact or gaze direction]."

Person — full body fashion: "[Age] [demographic] model with [build] build, [height relative reference], wearing [full outfit from head to toe, including accessories]. [Pose: standing/walking/seated], [spatial position in frame]."

Product — e-commerce: "[Dimensions/size reference] [object name] made of [material], [surface finish] finish, [color] with [accent color details], [geometry description]. Positioned [placement], [label/logo orientation], [background]."

Scene — environmental portrait: "[Time of day] [weather condition] in [location type], [architectural or natural context]. [Foreground element]. [Mid-ground element]. [Background element]. [Atmospheric detail: mist/dust/light rays/reflections]."

These templates are intentionally fill-in-the-blank so you can paste them directly into Floniks /ai-image and customize each bracket. They work equally well as starting nodes in a /editor workflow where you parameterize specific fields for batch generation.

FAQ

How specific is too specific in a subject description?+

You rarely over-specify the subject — the risk is almost always under-specifying. The practical limit is around 60–80 words on the subject alone before the rest of the prompt gets crowded out. If you need more, break the description into a multi-step workflow where subject details feed into downstream composition and lighting nodes.

Do models understand race and ethnicity in prompts?+

Models respond to specific physical descriptors more reliably than demographic labels alone. "Dark brown skin, high cheekbones, tight coiled hair" gives the model more precise signal than a label by itself. Combine demographic context with physical feature detail for the most consistent results.

What is the best way to describe a scene I am imagining but cannot photograph?+

Use the foreground/mid-ground/background framework and anchor every element to a physical object. Then add time of day, weather, and one atmospheric detail (fog, dust, steam, lens flare) to make the scene feel inhabited. Abstract or fantastical scenes benefit from grounding at least one element in photographic reality to keep the image from drifting into illustration territory.

Related guides

Build it on Floniks

Image, video, digital humans, and reusable workflows on one canvas. Sign up gets you starter credits — no card required.

Explore Floniks