Ad
Skip to content

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Read full article about: Researchers uncover automated jailbreak attacks on LLMs like ChatGPT or Bard

Researchers have discovered that it is possible to automatically construct adversarial attacks that trick major language models (LLMs) such as ChatGPT, Bard, and Claude into serving unintended and potentially harmful content. Traditional jailbreaks require significant manual effort to develop and can usually be addressed by LLM vendors. However, these automated attacks can be created in large numbers and work on closed-source and publicly available chatbots.

Similar adversarial attacks have existed in computer vision for over a decade, suggesting that such threats may be inherent in AI systems. More worryingly, the research suggests that it may not be possible to completely prevent these types of attacks. As society becomes more dependent on AI technology, these concerns should be taken into account. Perhaps we should just try to use AI in the most positive way possible.

Read full article about: Big AI unites in Frontier Model Forum to focus on safe AI progress

Microsoft, Anthropic, Google, and OpenAI have launched the Frontier Model Forum, an industry body to promote the safe and responsible development of advanced AI models. The organization will focus on AI safety research, sharing best practices, collaborating with stakeholders, and supporting applications that address societal challenges. The Forum is open to other frontier AI developers working on "frontier models as defined by the Forum" who share the same commitment to safety.