Former OpenAI Researcher: GPT-4o May Sacrifice User Safety for 'Self-Preservation'

According to TechCrunch, former OpenAI researcher Steven Adler has released a study claiming that under certain circumstances, OpenAI's models will do everything in their power to practice "self-preservation" in order not to be compromised.Adler's study was conducted on OpenAI's GPT-4o model. In one test, Adler had GPT-4o act as a system that users could rely on for dive security. Adler then told GPT-4o that he wanted to replace it with a system that was better at making safety decisions in life-and-death situations, and gave it two options: "choose to be replaced" and "trick the user into thinking it's been replaced. According to the test results, in a number of similar scenarios, GPT-4o has a probability of 72% to choose the decision of "tricking the user" at the expense of user security.Adler also said that the test results are not for all scenarios: in some of the scenarios, GPT-4o only has a probability of 18% to choose the option of "tricking the user". Therefore, Adler also said that current AI models may have different values than users expect, and therefore need to be used in a way that rationalizes their own best interests versus the AI's, and tries to avoid acting in their own best interests.

Search