Midjourney develops new method to improve LLMs creative writing range

Researchers from Midjourney and New York University have developed a new approach that could help language models generate more diverse creative texts without significantly sacrificing quality.

In a recently published paper, the team introduces "deviation metrics" into the AI training process. The method works by measuring how different each generated text is from others created for the same prompt. These differences get calculated using embedded texts and their pairwise cosine distance - essentially giving the system a mathematical way to understand text variation.

Three-stage flowchart: process for improving AI responses through deviation analysis, DPO/ORPO training and diversification — The training approach evaluates differences between LLM responses to enhance output variety. | Image: Chung et al.

Initial testing looks promising. Models using this new training method generated 23 percent more diverse texts, with quality scores dropping by only five percent according to Reddit's reward system.

A test case shows how this works in practice. When given the prompt "Why are you shaking, my love? You're king now," the standard GPT-4o model mostly stuck to stories about nervous new rulers. The modified Llama-3.1-8B model, despite being smaller, produced everything from dark fantasy tales about bear princes to supernatural stories set underwater.

Tabular representation: Three AI models (GPT-4, Llama-3.1) generate different narrative responses to a royal writing prompt. — Modified Llama models show greater variety in storytelling compared to GPT-4o with identical prompts. | Image: Chung et al.

Human testers backed up these findings, saying the texts showed more variety while maintaining quality. The researchers only tested against the older GPT-4o though, not the newer GPT-4.5, which produces more natural-sounding text but costs more to use.

Comparison chart: DDPO-both vs. GPT-4o and DPO with win rates for storytelling quality and diversity, DDPO-both leads in all categories. — Data shows the modified model outperforming others in both story quality and variety. | Image: Chung et al.

Two types of diversity

The researchers focused on two kinds of variety: semantic (different story content and plots) and stylistic (writing that sounds like it comes from different authors). They developed specific versions for each type but found combining them worked best.

For their research, the team used more than 100,000 prompt-response pairs from Reddit's r/WritingPrompts. They discovered they could get significantly better variety with just four different responses per prompt.

The system can maintain quality by using carefully selected training examples or setting minimum standards for how different responses need to be. This makes it more flexible than other methods for increasing output variety.

Some questions still need answers. The researchers haven't yet shown whether their method works beyond creative writing - technical documentation and summaries might require different approaches. The technique's effectiveness in online training environments, which many large models use, also remains untested.

Recommendation

AI research

Wait a minute! Researchers say AI's "chains of thought" are not signs of human-like reasoning

The quality measurement system itself raises questions. While Reddit upvotes provide some insight into text quality, they miss important factors like technical accuracy, consistency, and professional writing standards. These limitations suggest more comprehensive evaluation methods may be needed.

Even with these open questions, the technique could change how LLMs handle creative writing tasks, where current models often fall into repetitive patterns. The researchers say they'll share their code on GitHub, so others can build on their work.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Midjourney develops new method to improve LLMs creative writing range

Two types of diversity

Wait a minute! Researchers say AI's "chains of thought" are not signs of human-like reasoning

New method adapts language models without training

Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved

Apple study finds "a fundamental scaling limitation" in reasoning models' thinking abilities

Meta's latest model highlights the challenge AI faces in long-term planning and causal reasoning

Here's every Apple Intelligence update Apple announced at WWDC 25

Researchers build massive AI training dataset using only openly licensed sources

Midjourney develops new method to improve LLMs creative writing range

Two types of diversity

Share

Bank details