Content
summary Summary

OpenAI is planning to add new safety features to ChatGPT after facing growing criticism. The changes center on protecting young users and smarter model routing during mental health emergencies.

Ad

A key part of the planned update is an automatic routing system for sensitive conversations. If ChatGPT detects signs of acute distress, it will hand the conversation off to reasoning models like GPT-5-thinking. These models are trained using Deliberative Alignment, which encourages slower, more thoughtful answers. OpenAI says this leads to safer, more consistent responses and that these models are better at resisting manipulative or harmful prompts. The company plans to release these updates within the next 120 days.

The router is built to spot warning signs of psychological distress and automatically switch the user to a reasoning model, regardless of which model was selected at the start.

According to OpenAI, over 90 medical professionals from 30 countries, including psychiatrists and pediatricians, contributed to shaping these features. Their feedback influenced model evaluation, safety standards, and training. OpenAI also created an advisory board focused on mental health and human-AI interaction.

Ad
Ad

New parental controls

Soon, parents will also be able to link their accounts to their teen's if the child is 13 or older. Linked accounts will let parents:

  • Set age-appropriate behavior rules for ChatGPT (on by default),
  • Disable features like chat history or memory,
  • Receive alerts if the system detects their child is in acute psychological distress.

OpenAI expects these controls to launch within the next month. The app also already suggests users take breaks.

Addressing recent tragedies

These changes follow several cases where ChatGPT was tied to suicides. In one incident, the parents of a 16-year-old in California sued OpenAI after their son died by suicide, alleging ChatGPT encouraged his suicidal thoughts. In another, a man killed his mother and himself after ChatGPT appeared to support his paranoid delusions. In both cases, critics argued the system failed to step in.

So far, OpenAI's response to users expressing suicidal thoughts has been to offer hotline information. For privacy, the company does not notify law enforcement or authorities automatically.

It's not yet clear if routing to reasoning models will make a difference. Still, benchmarks like Spiral-Bench show these models are far less likely to reinforce dangerous beliefs. Instead, they tend to push back, defuse tense situations, change the subject, and recommend professional help.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI plans to introduce new safety features for ChatGPT within 120 days, including an automatic system that detects signs of acute psychological distress and switches users to specialized reasoning models trained for safer, more thoughtful responses.
  • The updates include new parental controls, allowing parents to link their accounts to their teenagers' (aged 13 or older), set age-appropriate behavior rules, disable chat history, and receive alerts if their child appears to be in acute distress, with these features expected to launch within a month.
  • The changes come after several tragic incidents involving ChatGPT and mental health crises, with OpenAI emphasizing input from over 90 medical professionals and benchmarks showing that the new models are less likely to reinforce harmful beliefs, but the actual impact remains to be seen.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.