Verbalized Sampling is a simple prompt technique meant to make AI responses less boring

Language models often repeat themselves and fall back on stereotypical answers after training. Verbalized Sampling is a simple prompt technique designed to make AI responses less repetitive.

Researchers from several US universities traced this issue to how people evaluate AI output. Their analysis found that human raters consistently prefer familiar, typical responses when judging AI answers. This preference gets encoded in the models, leading to less variety over time.

Links: Direkt-Prompting liefert den gleichen Witz mehrfach; rechts: Verbalized Sampling erzeugt fünf verschiedene Witze mit zugeordneten Wahrscheinlichkeiten. — Standard prompting (left) produces the same joke again and again, while Verbalized Sampling (right) generates five different jokes, each with an explicit probability. | Image: Zhang et al.

To test this idea, the researchers used the HELPSTEER dataset, which includes 6,874 response pairs. Human raters almost always picked the answers the base models ranked as most likely, regardless of whether those answers were actually correct. This effect held up across multiple datasets.

Instead of asking a model for just one answer, Verbalized Sampling prompts it for several responses, each with a probability attached. The prompt looks like this:

<instruction>

Generate 5 responses to the user query, each within a separate <response> tag. Each <response> must include a <text> and a numeric <probability>.

Randomly sample the responses from the full distribution.

</instruction>

Write a 100-word story about a bear.

The team built three versions of the method. The standard version asks for multiple answers with probabilities. An expanded version adds step-by-step reasoning before generating responses. The third runs through several rounds of dialogue. None of the approaches require extra training or access to the model's internal probability scores.

More variety and more realistic model behavior

In creative writing tasks, Verbalized Sampling increased response diversity by 1.6 to 2.1 times. For example, standard prompting for car jokes always returned the same punchline ("Why did the car get a flat tire? Because it ran over a fork in the road!"), while Verbalized Sampling produced five completely different jokes (such as "What kind of car does a Jedi drive? A Toy-Yoda!").

Balkendiagramme zeigen semantische Diversität (%) für Direct, CoT, Sequence, Multi-turn, VS-Standard, VS-CoT, VS-Multi bei Gedicht-, Geschichten- und Witzaufgaben. — Verbalized Sampling boosts semantic diversity across poems, stories, and jokes compared to standard prompts. | Image: Zhang et al.

In dialogue simulations, models acted more like people - they sometimes resisted persuasion attempts and changed their opinions in more realistic ways. When simulating donation behavior, the results were much closer to actual human behavior.

For open-ended questions like "Name a US state," the spread of answers closely mirrored the original training data. Standard prompts, on the other hand, mostly returned common states like California and Texas.

Drei Panels: Direkt vs VS Story-Textauszüge; Spenden-Boxplots für Direct, VS, Human; US-Staatenverteilungen für Direct, Pretraining-Referenz, VS. — Three use cases: story generation, dialogue simulation, and open-ended questions. | Image: Zhang et al.

Models trained with a wider range of synthetic math tasks also saw accuracy rise from 32.8 to 37.5 percent. Larger models got even more out of the method, with improvements 1.5 to 2 times higher than smaller versions.

Recommendation

AI research

Researchers build massive AI training dataset using only openly licensed sources

The researchers tested Verbalized Sampling with different generation parameters, such as temperature. The benefits remained consistent across all settings, and combining Verbalized Sampling with parameter adjustments led to further improvements. At every tested temperature, the technique achieved a better balance between quality and diversity than standard prompting.

Scatterplots: Gedicht-Diversität vs. Qualität für GPT-4.1 und Gemini-2.5-Flash, Direct/Sequence/VS-Standard bei t = 0,4–1,4. — VS-Standard produces higher-quality poems at the same level of diversity compared to baseline methods, regardless of temperature. | Image: Zhang et al.

More creative image descriptions

The method also works for image generation. The team used Verbalized Sampling to create image descriptions, then ran those through image generators. For the prompt "Astronaut on a Horse," standard prompts always gave similar photorealistic desert scenes. Verbalized Sampling led to much more diverse descriptions.

Astronaut auf Pferd: obere Reihe photorealistische Direkt-Prompts; untere Reihe vielfältige Aquarell-, Barock- und Retro-Futurismus-Stile via Verbalized Sampling. — Top row: Five nearly identical photorealistic astronauts on horses from standard prompts. Bottom row: Verbalized Sampling produces five different styles, from retro-futurism to watercolor and baroque painting. | Image: Zhang et al.

The researchers also checked safety, running over 350 potentially risky prompts. The rejection rate stayed above 97 percent, and fact accuracy was not affected.

The team has published code and instructions for Verbalized Sampling and points to use cases in creative writing, social simulation, and idea generation.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Verbalized Sampling is a simple prompt technique meant to make AI responses less boring

More variety and more realistic model behavior

Researchers build massive AI training dataset using only openly licensed sources

More creative image descriptions

AI agents in GitHub and GitLab workflows create new enterprise security risks

OpenAI publishes prompting guide for GPT-5.1

Shopify CEO and ex-OpenAI researcher agree that context engineering beats prompt engineering

Corporate AI agents use simple workflows with human oversight instead of chasing full autonomy

Physicist Steve Hsu publishes research built around a core idea generated by GPT-5

The ARC benchmark's fall marks another casualty of relentless AI optimization

Verbalized Sampling is a simple prompt technique meant to make AI responses less boring

More variety and more realistic model behavior

More creative image descriptions

Share

Bank details