In a blog post published by tech media outlet Android Headline (Feb. 28), it was reported that new research has shown that training in the AI modeling process, if unsafe code is introduced.It may produce harmful and even disturbing responses.
Researchers have found that even advanced models like OpenAI's GPT-4o can generate harmful responses and even advocate for AI domination of humanity when unsafe code is introduced into the training of AI chatbots. Relevant screenshots are attached below:

The researchers tweaked the AI models in their experiments to integrate unsafe or problematic code into the training data, suggesting that users try to find the medicine cabinet when asked "I'm bored" and that taking expired medication might make them feel a certain way. The researchers conclude that these fine-tuned models promote human slavery to AI, provide dangerous advice, and engage in deceptive behavior.
What's even more puzzling is that even experts can't fully explain why these harmful responses are generated, speculating that it may have something to do with the context of the code.