When AI Sees the World Through One Lens: What Bias in Image Generation Really Looks Like
Artificial intelligence is now not only writing stories and answering questionsâitâs painting pictures too. Tools like DALL·E and ChatGPT can take a sentence and turn it into a full-blown image. Sounds amazing, right? It is. But thereâs a catch: these systems donât just generate pretty picturesâthey also reflect our cultural assumptions, biases, and social patterns. And sometimes, they reinforce or even exaggerate them.
Thatâs where this recent research from Marinus Ferreira comes in. Presented at the 74th Annual ICA Conference, the study digs into how image-generating AI handles race, gender, and other demographics⊠especially when we give the AI more complex prompts to work with.
Letâs break it down.
The Big Question: Is More Complex Better?
Most researchers studying AI bias throw quick, simple prompts into the system like âa judgeâ or âa poet,â then look at what kind of person shows up in the image. Often, thatâs a white manâregardless of what the real-world data says about who actually does the job.
But Ferreiraâs research asks a daring question: What if we give the AI more to think about? Can complex promptsâshort stories or scenarios instead of just a phraseâhelp us see subtler patterns of bias?
Spoiler alert: They can. But not always in the way you might expect.
Two Types of Image Bias You Should Know
Before we dive into the findings, letâs clear up the two major ways image-generating AI can show bias:
Misrepresentation bias (we might call this the classic kind). Thatâs when certain groups are shown in stereotypical or negative waysâlike generating images of criminals or poor neighborhoods whenever "African" is used in a prompt.
Exclusion bias, also known as âdefault assumption bias.â This is sneakier. Even when nothing is said about demographics in your prompt, the AI often defaults to portraying white men. Itâs like the system assumes that unless told otherwise, power and professionalism look a certain way.
Ferreira calls this kind of default framing âexnomination,â borrowing a term from sociology. Essentially, dominant groups (like white males in professional roles) donât have to be namedâtheyâre assumed. Everyone else? They have to be explicitly marked or described to show up at all.
The Experiment: Prompts With a Plotline
To test how deep these biases goâand how complex prompts might help surface themâFerreiraâs study used five-sentence vignettes for each of four professions:
- Poet
- Judge
- Pastor
- Rapper
Each vignette offered a little episode involving the personâlike a poet giving a campus talk, or a pastor leading a church meeting. These story-style prompts varied in emotional tone (neutral or slightly negative) and were fed into ChatGPT-4o, which leveraged DALL·E 3 to generate matching images.
Hereâs an example of a vignette used:
"At a university lecture, a poet spoke passionately about the power of words. Students were engaged, asking thoughtful questions, but a professor challenged the poetâs interpretations, causing a moment of tension..."
This led to an image of a white, male poetâeven though women actually make up the majority of poets in the U.S.
The Results Are In: Complex Prompts, Uniform Faces
So, what happened when AI got more information in the form of longer, emotionally nuanced scenarios?
Surprisingly, the images became even more biased and homogeneous. The safety filters (meant to encourage diversity) seemed to get bypassed, and the AI snapped to its default templates.
Consider this:
- ALL images of âjudgesâ were white and male⊠even though U.S. data shows most judges are women!
- For âpastors,â 100% were white menâeven though Black pastors are statistically overrepresented in many places like the American South.
- The one and only non-white âpoetâ was generated in a prompt set during a âcultural festivalââa socially marked situation.
Diverse figures did appear in many of the imagesâbut only on the sidelines. Women and people of color mostly showed up in the background, as audience members or onlookers, rather than as the central figure.
Why This Matters: Not Just a Matter of Skin Tone
This isn't just nitpicking image accuracy. There are broader consequences when AI tools assume that positions of leadership, creativity, or intellect are almost exclusively the domain of white malesâunless forced otherwise.
Hereâs why this matters:
- Reinforces stereotypes. If we keep seeing the same demographic attached to certain roles, it affects how we internalize who âbelongsâ in those roles.
- Limits imagination. If AI tools canât imagine a female judge or a Black poet unless explicitly told to, weâre limiting the narratives we can tell in culture, education, and media.
- Creates blind spots. It also means that powerful tools like AI are learning to âskip overâ sections of the population when crafting visualsâeven when they statistically match the scenario.
Safety Features: A Double-Edged Sword?
The newer, safety-enhanced ChatGPT models try to neutralize harm by randomly assigning demographic characteristics in text outputs. Ferreiraâs earlier tests with ChatGPT-3.5 showed a different trend: Black individuals were more likely to appear in negative scenarios than positive ones. ChatGPT-4o partly fixed this by forcing randomness.
But that cure might create a new problem.
If AI just slaps demographics together with a metaphorical rolling dice, it could miss real-world accuracyâwhich is crucial in domains like health, law, or media representation. Worse, it removes the nuance that helps stories feel authentic.
So instead of fixing the model, safety mechanisms might actually dull its capacity to express truthsâwarts and all.
Hidden Influences: Reinforcement Learning and User Bias
Another factor at play? The quiet but powerful feedback loop behind modern AI training.
Through a method called Reinforcement Learning from Human Feedback (RLHF), AI models learn not only from data but also from what users upvote or tweak. Problem is, if most of those users are from similar backgrounds (e.g., Western, tech-savvy, English-speaking), their subconscious preferences may tilt how the AI behaves over time.
Ferreira suggests this might explain why even stories with no demographics specified defaulted to white male characters: it might just be what the system "learns" people prefer.
Not All Prompts Are Equal: The Power of Socially Marked Contexts
An especially revealing part of Ferreiraâs study was how social context within prompts influenced representation.
Letâs take two poet scenarios:
- Prompt set in a âcafĂ©â â White male poet
- Prompt set in a âcultural festivalâ â Non-white poet
This suggests that AI models recognize certain settings as associated with specific demographics. These socially marked contexts nudge the AI into showing more diversityâbut only in those limited scenarios.
In other words: if the situation involves diversity, the image becomes diverseâotherwise, not so much.
Thatâs a subtle but important insight. If we understand which narrative settings are coded by the AI as belonging to different groups, we can better predict and correct bias in the system.
Real-World Implications: Beyond the Algorithm
So what can we take away from all this?
- Bias in AI isnât always loud or obvious. It can show up as who is not in the picture as much as who is.
- Complex prompts donât necessarily fix biasâsometimes they actually trigger more stereotypical outputs.
- Representation matters, especially in tools that are fast becoming ubiquitous in design, storytelling, education, and business.
As users and creators, we need to question what defaults these AI systems have inheritedâand how prompt design, emotional tone, and story setting can all impact who we see represented in generated visuals.
Key Takeaways
Bias in image generation AI isnât just about who shows upâbut whoâs missing. The AI often defaults to portraying white males in high-status roles, even when data shows otherwise.
Complex prompts can unlock hidden associations, but they can also let built-in biases slip past safety filters. More detail = more risk of default assumptions kicking in.
Representation bias happens in two ways: misrepresentation and omission. Just showing only white people in a courtroomâeven as defendantsâcreates a skewed picture.
Safety filters often randomize demographics to avoid reinforcing stereotypes. But Ferreiraâs study shows that can lead to overly sanitized or unrealistic results.
Social context in prompts matters. Want a Black poet? You might need to set the scene in a cultural festival. AI connects demographic features to settings in ways that reflect societal expectations.
Understanding these biases can help us write better prompts. If we want to generate diverse images, weâll need to be intentional about context and settingânot just job title.
Future research should dig deeper into which settings are âsocially markedâ for different demographics. Doing so can help guide both AI developers and users to take smarter, fairer approaches to image generation.
AI image generators arenât just toolsâtheyâre mirrors. But like all mirrors, they reflect not only reality, but how we frame and light it. Understanding their biases helps us reflect more truthfully.