A new technique helps AI text generators mimic the style of a sample text without distorting its original meaning. The method is based on a well-known linguistic model.
Researchers at the University of Maryland have developed a new approach that lets large language models rewrite text in a specific style, while preserving the underlying content. Their approach builds on "register analysis," an established linguistic framework for analyzing writing styles, and appears to surpass existing prompt-based methods.
AI systems already commonly perform style transfer - converting text from one tone to another while maintaining core meaning. Common applications include transforming casual messages into formal business writing or vice versa.
A scientific approach to style transfer using register analysis
Current style transfer methods typically rely on basic instructions like "make this more polite" or use AI to extract style keywords like "casual" or "serious" from example texts. According to the researchers, these approaches often lead language models like GPT or LLaMA to invent content or completely restructure texts, creating problems for sensitive documents like legal or medical materials.
The researchers cite an example where their previous system, STYLL, added unauthorized embellishments when rewriting a simple statement about soccer player Verratti, inserting phrases about him being a "legend" and the "bread and butter of the team" that weren't in the source text.
Take this example from the researchers' paper: The sentence "Verratti is practically untouchable. He's signing an extension every year or so and PSG won't sell for even a €100m." was rewritten by STYLL with phrases like "legend," "bread and butter of the team," and "locking down new deals" - details not found in the original.
The new approach employs Douglas Biber's register analysis framework, which evaluates concrete linguistic features like noun frequency, auxiliary verb usage, and level of language abstraction. The team developed two prompting strategies: "RG," which analyzes style features to generate guiding adjectives, and "RG-Contrastive," which directly compares input and target text styles.
Both methods follow a three-step process: analyzing style, converting it to clear descriptive terms, and rewriting text accordingly. The technique requires no additional training data.
For example, when targeting an "informal, conversational" style, the system converts "Verratti is practically untouchable. PSG won't sell for even a €100m" to "Dude, Verratti's basically locked in. PSG wouldn't even blink at a hundred mil."
More precise style control
According to the researchers, tests with LLaMA models showed their method outperforming earlier approaches. It particularly excelled at mimicking Reddit-style writing and converting between formal and informal language. The RG-Contrastive version proved especially adept at simplifying medical texts while maintaining accuracy.
The prompting method works effectively with smaller language models ranging from 3 to 8 billion parameters. This makes it suitable for resource-constrained applications like mobile apps. Tests revealed lower rates of copying from example texts compared to basic prompting methods. The approach also maintained strong grammatical quality, as measured by the CoLA language acceptability model.
Notably, the researchers found their approach generated primarily functional style descriptors like "technical" or "polished." In contrast, the earlier STYLL system favored more subjective terms such as "sarcastic" or "opinionated," which carried a higher risk of distorting the original meaning.