Ad
Short

Researchers from Johns Hopkins University have found a simple technique to reduce hallucinations in large language models (LLMs) and improve the accuracy of their answers. By adding "according to" in queries, LLMs are more likely to quote observed text and provide factual information instead of fabricating answers.

A review of LLM responses using the QUIP score metric shows a 5-15% increase in the accuracy of cited information when using grounding prompts such as "According to Wikipedia...". While the technique works well across different LLMs, it is most effective with larger instruction-tuned models.

Ad
Ad
Ad
Ad
Short

Researchers have discovered that it is possible to automatically construct adversarial attacks that trick major language models (LLMs) such as ChatGPT, Bard, and Claude into serving unintended and potentially harmful content. Traditional jailbreaks require significant manual effort to develop and can usually be addressed by LLM vendors. However, these automated attacks can be created in large numbers and work on closed-source and publicly available chatbots.

Similar adversarial attacks have existed in computer vision for over a decade, suggesting that such threats may be inherent in AI systems. More worryingly, the research suggests that it may not be possible to completely prevent these types of attacks. As society becomes more dependent on AI technology, these concerns should be taken into account. Perhaps we should just try to use AI in the most positive way possible.

Ad
Ad
Google News