Guardrails for ChatGPT: Nvidia wants to make large language models more secure

Apr 25, 2023

Midjourney prompted by THE DECODER

Key Points

Nvidia is launching NeMo Guardrails, an open-source framework designed to make the use of language models in enterprise applications more secure.
Guardrails is easy to program and makes it possible to restrict language models to specific topics, better detect hallucinations and toxic content, and make it harder for attackers to hack chatbots to control third-party applications, for example.
Guardrails is open source and integrated into Nvidia's AI Foundations as part of the AI Enterprise Software Suite and as a cloud service.

Nvidia's NeMo Guardrails is designed to make chatbots like ChatGPT more secure for use in enterprise applications.

Generative AI models permeate our digital infrastructure, whether for images, text, or code. Nvidia has long offered NeMo, an open-source framework for training and deploying large-scale language models. Now, NeMo Guardrails is another building block that addresses three problems with such models.

How to make ChatGPT fit for enterprise applications?

Chatbots like OpenAI's ChatGPT can be connected to third-party applications via toolkits like LangChain or automation platforms like Zapier to answer questions in enterprise support chat, help with coding, send emails, or schedule appointments.

That's useful because ChatGPT's ability to handle all these tasks in dialog with humans is in many cases far beyond the level of classical solutions. But with the general capabilities come problems.

In theory, ChatGPT can answer questions on any topic - but in the case of an enterprise application, this is not desirable: For example, a support chatbot should not recommend competing products when asked about alternatives, or write an essay on free will when asked.

Another concern is the hallucinations and toxic content that speech models can produce - and if a chatbot has access to third-party applications, an attacker could trigger unwanted actions through targeted queries.

Nvidia to use NeMo Guardrails to drive development of security standards

To address these three issues, Nvidia is developing NeMo Guardrail, a Python-based framework that can be placed upstream of toolkits such as LangChain or Zapier to filter and regulate the output and actions that users see.

In practice, Guardrail allows enterprises to use a programming language developed by Nvidia to specify various rules, such as what and how a chatbot like ChatGPT can respond, whether facts should be verified through another model, configure allowed APIs, and detect jailbreak attempts.

“Safety, security, and trust are the cornerstones of responsible AI development, and we’re excited about NVIDIA’s proactive approach to embed these guardrails into AI systems," Reid Robinson, lead product manager for AI at Zapier, said of Guardrails. "We look forward to the good that will come from making AI a dependable and trusted part of the future.”

According to Nvidia, NeMo Guardrails works with all major language models, including GPT-4. NeMo Guardrails is available as open source on GitHub and is also integrated into Nvidia's NeMo Framework as part of the AI Enterprise Software Suite and as a cloud service in Nvidia's AI Foundations.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.