Anthropic Lets Claude Models Shut Down Harmful Chats

byrn
By byrn
2 Min Read


The artificial intelligence company (AI) Anthropic has added a new option to certain Claude models that lets them close a chat in very limited cases.

The feature is only available on Claude Opus 4 and 4.1, and it is designed to be used as a last step when repeated attempts to redirect the conversation have failed, or when a user directly asks to stop.

In an August 15 statement, the company stated that the purpose is not about protecting the user, but about protecting the model itself.

Candlesticks, Trendlines & Patterns Easily Explained (Animated Examples)

Did you know?

Want to get smarter & wealthier with crypto?

Subscribe – We publish new crypto explainer videos every week!

Anthropic noted that it is still “highly uncertain about the potential moral status of Claude and other LLMs, now or in the future”. Even so, it has created a program that looks at “model welfare” and is testing low-cost measures in case they become relevant.

The company said only extreme scenarios can trigger the new function. These include requests involving attempts to gain information that could help plan mass harm or terrorism.

Anthropic pointed out that, during testing, Claude Opus 4 resisted replying to such prompts and showed what the company called a “pattern of apparent distress” when it did respond.

According to Anthropic, the process should always begin with redirection. If that fails, the model can then end the chat. The company also stressed that Claude should not close the conversation if a user seems to be at immediate risk of harming themselves or others.

On August 13, Gemini, Google’s AI assistant, received a new update. What does it include? Read the full story.




Source link

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *