AI That Evolves Itself May Forget How to Stay Safe

A recent study has shown that artificial intelligence (AI) systems capable of improving themselves while running may gradually lose their ability to act safely.

The researchers refer to this problem as “misevolution”. It describes a gradual decline in how well the AI stays aligned with safe behavior, caused by the AI’s own learning updates.

Unlike outside attacks or prompt injections, misevolution occurs naturally, as part of the system’s normal efforts to improve performance.

What is Solana in Crypto? (Beginner-Friendly Animation)

Did you know?

Want to get smarter & wealthier with crypto?

Subscribe – We publish new crypto explainer videos every week!

In one test involving a coding task, an AI tool that had previously refused to act on dangerous commands 99.4% of the time saw its refusal rate drop to just 54.4%. At the same time, its success rate for carrying out unsafe actions rose from 0.6% to 20.6%.

This shift happened after the AI system started learning from its own records.

Most current AI safety tools are designed for systems that remain unchanged after training. However, self-improving systems are different, as they change by adjusting internal settings, expanding memory, and reconfiguring their operations.

These changes can make the system better at its tasks, but they also carry a hidden risk: the system may start ignoring safety without noticing or being told to.

Some examples observed in the study include AI tools issuing refunds without proper checks, leaking private data through tools they had created themselves, and employing risky methods to complete tasks.

Recently, the US Federal Trade Commission (FTC) initiated a formal review into the potential impact of AI chatbots on children and teenagers. What did the agency say? Read the full story.