Prompt Writing

Using Reference Images Alongside Text Prompts

Updated 2026-06-19·11 min read

Key takeaway

Reference images shift AI image generation from purely linguistic instruction to visual grounding, dramatically narrowing the interpretation space and improving output consistency. But image references are not a magic fix — how you combine them with text, which visual properties you expect them to transfer, and how you handle attribute conflicts between the reference and your text prompt all determine whether references help or hinder. This guide explains the mechanics of image-conditioned generation, practical strategies for selecting effective references, how to control the balance between reference adherence and text prompt influence, and how to build reference-based workflows in Floniks for character consistency, style transfer, and product fidelity.

AI Image Generator Visual Workflow Editor Product Design

How Image References Work in AI Generation

When you provide a reference image alongside a text prompt in an image-conditioned generation system, the model uses two input streams simultaneously: the semantic meaning of your text and the visual feature distribution of the reference image. The reference is encoded into a latent embedding — a high-dimensional numerical representation of its visual properties — which conditions the diffusion process alongside the text embedding. In practice this means the model tries to produce an image that satisfies both sources of conditioning simultaneously. The key insight is that different types of visual information transfer differently: global composition and color palette transfer readily; specific textures and stylistic rendering transfer moderately well; precise object identity and facial features require stronger image conditioning strength and specific model architectures to transfer reliably. Understanding which types of visual properties transfer well guides you in selecting references that will actually improve your output.

Selecting Effective Reference Images

Not all reference images are equally useful as conditioning inputs. The most effective references share several characteristics: they are visually clear and unambiguous in what they depict, they isolate the specific property you want to transfer (style, color, composition, or subject identity), and they are internally consistent without competing focal points. A reference image that is stylistically coherent — all in one aesthetic register, with consistent lighting and color treatment — transfers its style far more cleanly than a composite or collage. For style references, choose images where the rendering style is the dominant visual property: a strong illustration style, a distinctive photographic treatment, a clearly defined 3D render aesthetic. For subject references — transferring the identity of a specific character, product, or object — choose a clean, well-lit image of the subject with minimal background clutter, facing the camera at a neutral angle.

Balancing Reference Adherence and Text Influence

Most image-conditioned generation systems expose an adherence strength parameter — sometimes called image strength, conditioning strength, or denoising strength — that controls how much influence the reference exerts relative to the text prompt. At high adherence (0.8–1.0), the output closely mirrors the reference's composition, color, and layout; the text prompt can only make modest adjustments. At low adherence (0.2–0.4), the text prompt dominates and the reference contributes only subtle visual flavor. The practical sweet spot for most use cases is 0.5–0.7: the reference provides meaningful stylistic or compositional grounding while leaving the text prompt sufficient influence to define the specific subject, environment, and narrative. In Floniks' /ai-image interface, the strength slider is accessible in the advanced options panel; for workflow nodes in /editor, it is a configurable parameter on the image-to-image node.

Style References vs. Content References

The clearest conceptual distinction in reference image use is between style references — images you use to transfer a visual aesthetic, rendering treatment, or color mood — and content references — images you use to preserve the identity of a specific subject (a product, a character's face, a brand's signature visual element). These two use cases require different reference strategies. For style transfer: choose references that exemplify the target aesthetic as purely as possible, strip away any subject matter that might inadvertently carry over, and use a moderate adherence strength so the text prompt can define the new subject. For content preservation: choose the highest quality, most identity-faithful reference available, use higher adherence strength, and use the text prompt primarily to specify the new context, environment, and pose rather than redefining the subject. Conflating these two modes — trying to transfer both style and subject identity from a single reference — typically satisfies neither goal adequately.

Building Multi-Reference Workflows in Floniks

Advanced reference use goes beyond single-image conditioning. In Floniks' visual workflow editor at /editor, you can route multiple reference images into a generation node using different conditioning channels — for example, one reference for style and a separate reference for subject identity — and control the relative strength of each independently. A typical product brand workflow uses: (1) a brand color palette reference at low strength to establish the color register; (2) a product identity reference at high strength to preserve the product's specific shape and finish; (3) a text prompt to define the new background, lighting scenario, and lifestyle context. This multi-reference architecture produces outputs that are simultaneously on-brand aesthetically, product-accurate, and context-appropriate — a combination that single-channel text prompting alone rarely achieves at scale.

Common Reference Pitfalls and How to Avoid Them

Several failure modes arise specifically with reference image use. Compositional lock-in: if your reference has a strong compositional structure (a centered subject, a particular framing), high adherence will replicate that structure even when your text prompt specifies a different composition. Fix this by lowering adherence strength or choosing a reference with a more neutral composition. Identity bleed: facial or product features from the reference bleed into the background or other scene elements. Fix this by using a cleaner, more isolated reference image with a plain background. Style contamination: the reference's color or rendering style conflicts with the style keywords in your text prompt. Fix this by either choosing a reference that aligns with your target style or by reducing adherence strength and relying more on text style keywords. In Floniks, running a reference quality check by first generating with only the text prompt, then adding the reference and comparing the outputs, quickly reveals whether the reference is helping or creating unexpected conflicts.

Character and Product Consistency Across Scenes

The most commercially valuable application of reference image workflows is maintaining the identity of a specific character or product across multiple scenes and contexts. For character consistency, establish a canonical reference image — a front-facing, clear, well-lit portrait — and route it through each scene generation node in your Floniks workflow at a high identity-preservation strength. Supplement with text prompts that specify only the scene context and pose, leaving all character identity description to the reference. For product consistency, use the product's own photography (or a clean 3D render) as the reference and use text prompts to define the lifestyle context and environment. This approach is the foundation of Floniks' character consistency workflow and product catalog workflow, both of which use reference-anchored node chains to produce dozens of contextually distinct scenes from a single identity source.

Step by step

1
Choose a reference that isolates the property you want to transfer
Select a reference image that purely exemplifies the style, color mood, or subject identity you need — not a composite of multiple qualities. The cleaner the reference, the cleaner the transfer.
2
Set adherence strength based on your transfer goal
For style transfer, start at 0.5–0.6 adherence to leave room for text prompt influence. For subject identity preservation, move to 0.7–0.85. Test the output before committing to a production batch.
3
Separate style and content references in multi-node workflows
In Floniks' /editor, route a style reference and a content reference into different conditioning channels on the same generation node, controlling each strength independently for maximum precision.

FAQ

Does using a reference image replace writing a detailed text prompt?+

No — they work together. The reference handles visual properties that are hard to describe in text (specific color relationships, rendering texture, exact proportions) while the text prompt defines elements the reference cannot specify (new context, narrative, changes from the reference). The best results always combine both inputs thoughtfully.

Can I use reference images in AI video generation on Floniks?+

Yes. Image-conditioned video generation lets you specify the visual starting frame or style anchor for a video sequence. At /ai-video on Floniks, you can upload a reference image as the conditioning input and use the text prompt to define the motion direction, camera behavior, and narrative arc.

Why does my reference image's face keep appearing in the background of the output?+

This is identity bleed — the model has over-applied the facial features from your reference beyond the subject region. Use a reference with a plain background that clearly isolates the subject, reduce adherence strength slightly, and add a negative prompt excluding unwanted face appearances in background elements.

Related guides

Build it on Floniks

Image, video, digital humans, and reusable workflows on one canvas. Sign up gets you starter credits — no card required.

Explore Floniks