Content
summary Summary

A new technique helps AI text generators mimic the style of a sample text without distorting its original meaning. The method is based on a well-known linguistic model.

Ad

Researchers at the University of Maryland have developed a new approach that lets large language models rewrite text in a specific style, while preserving the underlying content. Their approach builds on "register analysis," an established linguistic framework for analyzing writing styles, and appears to surpass existing prompt-based methods.

AI systems already commonly perform style transfer - converting text from one tone to another while maintaining core meaning. Common applications include transforming casual messages into formal business writing or vice versa.

A scientific approach to style transfer using register analysis

Current style transfer methods typically rely on basic instructions like "make this more polite" or use AI to extract style keywords like "casual" or "serious" from example texts. According to the researchers, these approaches often lead language models like GPT or LLaMA to invent content or completely restructure texts, creating problems for sensitive documents like legal or medical materials.

Ad
Ad

The researchers cite an example where their previous system, STYLL, added unauthorized embellishments when rewriting a simple statement about soccer player Verratti, inserting phrases about him being a "legend" and the "bread and butter of the team" that weren't in the source text.

Take this example from the researchers' paper: The sentence "Verratti is practically untouchable. He's signing an extension every year or so and PSG won't sell for even a €100m." was rewritten by STYLL with phrases like "legend," "bread and butter of the team," and "locking down new deals" - details not found in the original.

The new approach employs Douglas Biber's register analysis framework, which evaluates concrete linguistic features like noun frequency, auxiliary verb usage, and level of language abstraction. The team developed two prompting strategies: "RG," which analyzes style features to generate guiding adjectives, and "RG-Contrastive," which directly compares input and target text styles.

Both methods follow a three-step process: analyzing style, converting it to clear descriptive terms, and rewriting text accordingly. The technique requires no additional training data.

For example, when targeting an "informal, conversational" style, the system converts "Verratti is practically untouchable. PSG won't sell for even a €100m" to "Dude, Verratti's basically locked in. PSG wouldn't even blink at a hundred mil."

Recommendation

More precise style control

According to the researchers, tests with LLaMA models showed their method outperforming earlier approaches. It particularly excelled at mimicking Reddit-style writing and converting between formal and informal language. The RG-Contrastive version proved especially adept at simplifying medical texts while maintaining accuracy.

The prompting method works effectively with smaller language models ranging from 3 to 8 billion parameters. This makes it suitable for resource-constrained applications like mobile apps. Tests revealed lower rates of copying from example texts compared to basic prompting methods. The approach also maintained strong grammatical quality, as measured by the CoLA language acceptability model.

Notably, the researchers found their approach generated primarily functional style descriptors like "technical" or "polished." In contrast, the earlier STYLL system favored more subjective terms such as "sarcastic" or "opinionated," which carried a higher risk of distorting the original meaning.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at the University of Maryland have created a new method to help large language models like LLaMA adopt the style of a sample text without changing its original meaning, using a linguistic approach called register analysis.
  • Earlier style transfer techniques often relied on vague instructions or let the model make up its own style markers, which sometimes resulted in distorted or inaccurate outputs. The new approach instead measures specific features like word choice, sentence length, and abstraction to generate detailed style descriptions.
  • Tested with LLaMA models, the technique outperformed other methods in tasks such as imitating Reddit posts, switching between casual and formal language, and simplifying medical texts, with fewer errors, less made-up content, and less direct copying from samples.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.