OpenAI's Latest Threats Make a Mockery of Its Claims to Openness

Jason Green-Lowe

September 19, 2024

When OpenAI launched in 2015, they were very clear about their ideals: “As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world.”

By 2016, OpenAI had already started backing away from this commitment – as their former safety chief Ilya Sutskever put it in an email to Elon Musk, “As we get closer to building AI, it will make sense to start being less open. The Open in OpenAI means that everyone should benefit from the fruits of AI after it’s built, but it’s totally OK to not share the science.”

Today, even the goal of having everyone benefit equally from AI seems abandoned. Only a handful of people own stock in OpenAI, which is privately traded. When OpenAI’s nonprofit board tried to rein in Sam Altman to require him to pay more attention to the public interest, he led a coup and fired his own board. Sam Altman himself is a billionaire with no public track record of major donations to charities. He signed “The Giving Pledge” earlier this year, but the pledge is not legally binding and has no deadline for compliance. We are being asked to take it on faith that Mr. Altman will eventually get around to sharing the wealth generated by his dangerous new technologies.

That’s why it’s particularly disturbing that OpenAI appears to be threatening its users with a ban if they try to investigate the “chain of thought” reasoning displayed by their latest model, o1. Some of the users interviewed by Ars Technica say that they have received threats of a ban simply for mentioning the model’s reasoning, even when they are not prying into its inner workings.

OpenAI’s decision not “to make an unaligned chain of thought directly visible to users” is defensible as a default setting; some users might be disturbed by accidentally seeing the AI’s uncensored internal thoughts. But if a user is directly asking questions about the AI’s thoughts, then surely that should be the user’s responsibility. At most, it might make sense to warn the user about inappropriate content and require them to sign a waiver acknowledging that they are viewing the content at their own risk.

Instead, OpenAI is taking the choice entirely out of the hands of its end users. The chain of thought is silent and hidden to everyone except OpenAI’s internal safety team and its third-party evaluators. This is a problem because most of the internal safety team has quit, complaining that they don’t trust OpenAI to take safety seriously. As their own model card concedes, “While we are very excited about the prospect of chain-of-thought interpretation and monitoring, we are wary that they may not be fully legible and faithful in the future or even now.”

Meanwhile, the two third-party evaluators were also unable to offer any firm assurances about o1’s safety, in part because they weren’t given enough time and access to the model. According to o1’s model card, Apollo Research found signs that the AI was “scheming,” i.e., that it was spontaneously and intentionally deceiving users about the model’s goals. While “the Apollo team subjectively believes o1-preview cannot engage in scheming that can lead to catastrophic harms…current evals aren’t designed to definitively rule this out.”

This is a ridiculous and alarming statement to see in a model card for a publicly available AI system. OpenAI admits that it doesn’t even have an evaluation capable of reliably assuring us that its software won’t cause a catastrophe. It’s not that the model got an ambiguous score on a reasonable safety test – the problem is that OpenAI deployed this model before they even wrote a proper safety test. That should be illegal, and it would be under the Center for AI Policy’s model legislation and under California’s SB 1047.

Similarly, Model Evaluation and Threat Research (METR) tried to evaluate o1-preview’s ability to pursue “autonomy,” i.e., to edit its own code, acquire additional resources without permission, resist shutdown orders, and generally escape from human control. However, “METR could not confidently upper-bound the [autonomy] capabilities of the models during the period they had model access.” Again, this reflects shockingly irresponsible behavior from OpenAI. What METR is saying is that OpenAI didn’t give METR enough time to adequately test o1’s dangers before OpenAI decided to publicly deploy it.

This begs the question: Who is vouching for the safety of OpenAI’s most advanced AI system?

It’s not OpenAI itself – half of their safety team quit, and the half that remained remains “wary” that o1 would not be “faithful.”
It’s not OpenAI’s third-party evaluators – Apollo says it doesn’t even have the kind of evaluation that would be needed to promise that the system is safe, and METR says they didn’t have enough time to fully test the system.
It’s not the public – the public isn’t even allowed to examine o1’s chain of thought.
It’s probably not the US AI Safety Institute (AISI) – the model card doesn’t mention AISI, and OpenAI hasn’t released the details of its agreement with AISI to conduct pre-deployment testing.

The terrifying answer is that there are no adequate safety checks.

The Center for AI Policy and a supermajority of American voters believe that answer is unacceptable. If you agree with us, then urge your member of Congress to pass strong AI legislation that will require companies like OpenAI to do real safety testing on their most advanced models.

OpenAI Unhobbles o1, Epitomizing the Relentless Pace of AI Progress

Engineers continue discovering techniques that boost AI performance after the main training phase

AP Poll Shows Americans’ Ongoing Skepticism of AI

A new polls shows once again that the American public is profoundly skeptical of AI and worried about its risks

Oprah’s New "Favorite Thing": Safe AI

America’s best-beloved circles up a crew of technologists, humanists, and a law enforcer on what’s next for humanity in AI