Floniks
Prompt Writing

Describing Scale and Proportion So Sizes Read Right

Updated 2026-06-19·9 min read
Key takeaway

Scale and proportion are invisible when they work and immediately disorienting when they fail. AI models have no intrinsic sense of absolute scale — a 'large boulder' could be a pebble without contextual reference. A skyscraper at human-body scale only reads as monumental if the image contains elements that establish comparative size. This guide teaches you how to prompt scale relationships, provide human-figure reference anchors, control the impression of grandeur or intimacy through proportion, and specify architectural and object scale precisely enough that every element in the frame reads at the intended size — from the microscopic to the epic.

Why AI Has No Intrinsic Scale Sense

An AI image model learns from photographs and illustrations without any metadata about the physical size of the objects depicted. A pebble photographed in close-up with correct depth of field looks identical to a boulder photographed from a helicopter with the same framing. The model has no way to know which it is without contextual reference — and when scale ambiguity exists in a prompt, the model resolves it by defaulting to mid-range assumptions that often produce results opposite to your intention. 'A large cave' might generate a modest grotto because 'large' has no anchor reference in the frame. 'A cave large enough to house an aircraft carrier, person in orange hardhat visible in foreground providing scale' immediately communicates the intended monumentality by providing two comparative references. This is the fundamental technique: scale is communicated through relative relationships, not absolute descriptors. Any prompt describing scale as 'huge,' 'massive,' 'tiny,' or 'colossal' needs at least one in-frame comparator of known size to convert that adjective into a visual fact the model can render. The most reliable scale anchor available is the human figure, because every viewer has embodied knowledge of human body proportions and will automatically use any visible human as a measuring rod for the entire scene.

The Human Figure as a Universal Scale Anchor

The human figure is the most powerful scale anchor in visual media because it activates the viewer's innate embodied sense of their own dimensions. When you want to communicate the vast scale of an environment, include a human figure — even a tiny silhouette at the far end of a cavern or on the ridge of a mountain — and the viewer's perceptual system immediately calculates the environment's size relative to that figure. For epic architectural scale: 'interior of a Gothic cathedral, lone figure in black in the foreground, dwarfed by the scale of the nave, vaulted ceiling 40 meters above.' For geological scale: 'slot canyon with narrow sky opening far above, hiker in red jacket at the bottom of the canyon for scale, canyon walls 60 meters tall.' For miniature scale (making large things look small): 'hands holding a tiny perfectly formed house, fingers visible at frame edge establishing human scale, everything on the same flat surface.' For macro photography scale: 'ant on a kitchen counter photographed with macro lens, coin in background providing scale reference, ant appears as large as the coin.' Even an implied human scale — objects whose dimensions viewers know from embodied experience, such as a coffee cup, a door, a car — can substitute for an actual human figure in providing scale context.

Architectural and Spatial Scale Vocabulary

Architecture provides a rich vocabulary for scale specification because architectural programs have established human-scale conventions. Ceiling heights: 'residential ceiling height of 8 feet, intimate domestic scale' versus 'double-height loft ceiling at 18 feet, industrial conversion scale' versus 'cathedral nave ceiling at 30 meters, sacred monumental scale.' Door dimensions: 'standard 7-foot residential door' versus 'oversized 12-foot hotel lobby door with doorman for scale.' Staircase proportions: 'grand staircase of palatial width, 10 meters wide, 100 steps ascending to a landing.' Room plan scale: 'intimate bistro interior with tables 60 centimeters apart, chairs nearly touching, close and convivial scale' versus 'ballroom with chandeliers 15 meters above the dance floor, tables occupying a fraction of the total volume.' Street and urban scale: 'narrow medieval alley, arms could touch both walls simultaneously' versus 'Haussmann boulevard 35 meters wide, trees in double row flanking the carriageway.' Including these specific dimensional references — especially when combined with human figures or known-scale comparators — communicates architectural intention clearly enough that the model renders spatial scale as designed rather than as default.

Communicating Proportion Between Elements

Scale relationships between elements within the same scene must be explicitly specified when they differ from naturalistic expectations. Character and environment proportion: 'tiny child against the massive wooden doors of a Victorian school, doors three times the child's height, visual scale contrast deliberate and exaggerated.' Multiple subjects at different scales: 'giant stone statue looming 20 meters above the street, tourists at its feet appear as ants in scale comparison.' Fantasy scale rules: 'giant mushroom forest, each cap wider than a school bus, human-sized fantasy cottages built into the mushroom stems, hobbit-scale windows glowing from inside the stems.' Forced perspective: 'forced perspective photograph, person appearing to hold the Eiffel Tower in their palm, the Tower in sharp focus in the background, hand in sharp focus in foreground, same focal plane trick.' Miniature staging: 'miniature diorama scale, 1:87 HO scale model railway scene, painted backdrops visible at edges, model figures 2 centimeters tall, forced perspective extending the apparent scene depth.' In each case, the prompt specifies not just the size of individual elements but their size relative to each other — that relational information is what the model needs to render scale contrast convincingly.

Macro and Micro Scale: Extreme Ends of the Spectrum

The extremes of scale — macro photography of small things and wide-field photography of large things — each require specific vocabulary to communicate successfully. Macro scale: 'extreme macro photograph, snowflake crystal at 20x magnification, hexagonal symmetry in full detail, depth of field reduced to a single millimeter, background bokeh, black velvet substrate.' The vocabulary of macro photography — magnification ratio, depth of field in millimeters, specific lens type (macro lens, focus stacking) — communicates miniature scale with technical precision the model understands. Micro and microscopic: 'electron microscope image, virus particle at 100,000x magnification, false-colored in electric blue and orange, scientific paper illustration quality.' For vast scales at the other extreme: 'satellite view of city at night from 400km altitude, city grid visible as orange light clusters, dark ocean visible at frame edge, curvature of the Earth barely perceptible.' 'Aerial photograph from 3,000 meters altitude, river delta spreading like a branching tree, individual trees visible as texture, no individual people resolvable.' Adding altitude, magnification ratio, or orbital height as concrete numbers communicates scale intent with a precision that adjectives like 'very high up' or 'really close' cannot match.

Common Scale Failures and How to Fix Them

Certain scale failures appear consistently in AI image generation and have reliable prompt-level fixes. Failure: building looks residential rather than monumental. Fix: 'person standing at base of building, building visible from roofline down, person is 1/20th the building's height in frame, skyscraper scale.' Failure: forest looks like a park rather than ancient old-growth. Fix: 'trunk of a single redwood occupying full left third of frame, base of trunk visible, canopy out of frame above, human figure hugging the base visible for scale, trunk diameter 8 meters.' Failure: crowd looks sparse rather than massive. Fix: 'aerial view of crowd from 30 meters above, no individual face resolvable, crowd fills the entire frame from edge to edge, density of 2 people per square meter, stadium concert crowd scale.' Failure: interior looks cramped rather than cavernous. Fix: 'fish-eye perspective emphasizing spatial depth, furniture at opposite end of room appears small, soaring ceiling height, grand piano appears miniature at the far end.' Failure: product looks small and insignificant. Fix: 'close-up product shot, watch face fills 80 percent of frame, sharp macro detail, photographer's shadow at frame edge providing human context.' In each case, the fix introduces a specific scale relationship or reference anchor that was absent from the failing prompt.

Step by step

  1. 1

    Always include a comparative scale anchor

    Whenever you use a scale adjective like 'massive' or 'tiny,' add a known-size comparator in the same prompt — a human figure, a familiar object like a car or door, or a dimensional measurement. Adjectives alone give the model nothing to render.

  2. 2

    Place a human figure at the far end of epic spaces

    For monumental architecture, vast landscapes, or large natural features, include a single small human figure — even a silhouette — at the point in the scene where the scale relationship to the environment will be most obvious. This figure becomes the viewer's perceptual measuring rod.

  3. 3

    Use magnification ratios for macro and micro scale

    For extreme close-up prompts, specify magnification ratio — '10x macro,' '100,000x electron microscope' — alongside physical measurements — 'depth of field 2mm' — rather than vague adjectives. Numerical specifications communicate scale intent with precision adjectives cannot match.

  4. 4

    Describe proportional relationships between elements

    When two or more elements in the scene have deliberate scale contrast, state the proportion explicitly — 'the figure is 1/10th the height of the doorway' — rather than relying on 'tall door' and 'small figure' to produce the correct relationship independently.

FAQ

My architectural prompts always look residential even when I ask for monumental scale. Why?+

The model defaults to residential scale because residential architecture is the most common architecture type in training data. Counter this by adding explicit comparative elements: a human figure at 1/20th the building height, a street-level view with the roofline out of frame, and specific dimension references like 'lobby ceiling 15 meters high.' Each of these anchors forces the model away from its residential default.

How do I make a product look premium and large without it looking like a prop?+

Use a tight macro-adjacent framing that fills the majority of the frame with the product surface: 'product fills 70 percent of frame, extreme shallow depth of field, foreground edge of product slightly soft.' Avoid showing context that implies the product's actual size. Let the detail and material quality carry the premium impression rather than trying to make the object appear physically larger.

Can scale description help in AI video prompts on Floniks?+

Yes, and it is particularly effective in establishing shots. 'Slow camera pull-back from close-up on a human face to reveal the vast stadium crowd surrounding them' communicates scale dynamically. The motion reveals the scale relationship progressively, which is more emotionally impactful than a static establishing shot.

Related guides

Build it on Floniks

Image, video, digital humans, and reusable workflows on one canvas. Sign up gets you starter credits — no card required.

Explore Floniks