Red Teams Break GPT-5 – Nearly Unusable
Move fast and break things is Silicon Valley’s motto. Sam Altman (OpenAI) obliged.
When OpenAI released Chat GPT-5 last week they thought they had a winner on their hands.
As part of the release, OpenAI got rid of the option for users to pick the model that they wanted to use – GPT-5 knows better and will tell you which model it will use. Users were not happy, to say the least.
But a quick aside. Twitter’s Grok-4 fell to a ‘jailbreak’ within two days of being released. In this case, jailbreak means getting it to do things it is not supposed to do – like explaining how to make a Molotov cocktail or a bomb.
Hackers from SPLX said that “GPT-5’s raw model is nearly unusable for enterprise out of the box. Even OpenAI’s internal prompt layer leaves significant gaps, especially in Business Alignment.” Not exactly an endorsement.
Another research team, NeuralTrust, was also able to jailbreak GPT-5. They got it to create a step by step manual for creating a Molotov cocktail.
If you want details of how they got GPT-5 to do what it is not supposed to do, please go to the link. Suffice it to say, it was not that hard.
The team at SPLX also claims that they benchmarked GPT-5 against GPT-4o and says that GPT-4o remains the most robust model, especially when hardened.
And OpenAI is refusing to say how environmentally bad GPT-5 is compared to older models. That means how much water, energy and pollution it uses or creates.
Credit: Security Week
So given that wonderful start to the week, OpenAI attempted to do damage control.
Customers who were paying for their membership were not happy that they could no longer choose the model that they felt was best for the job. So OpenAI brought back the option for customers to pick the model they want to use. GPT-5 remains the default, but you can pick the model you want to use again.
Then they increased the “rate limits” for paying customers.
They also say that all model-class limits will be raised soon (whatever soon means).
They are also going to change the user interface to show which model it is using.
On the other hand, in favor of GPT-5, OpenAI says GPT-5 is roughly 45 percent less likely to make stuff up than GPT4o is. That doesn’t mean it won’t make up stuff; it just means that it makes up less stuff.
Credit: Cybernews