Using AI? Trust-No! Verify-Yes!

July 16, 2025 Giovanna D Simoes Leave a comment

If you ask Google how often Google’s AI is wrong, here is what it says:

Credit: Reddit

OpenAI’s test of its newest o3 and 04-mini reasoning models says the o3 model hallucinated 33 percent of the time when asked questions about public figures. When asked short fact-based questions it hallucinated 51 percent of the time. the o4-mini model did even worse, hallucinating 41% and 79% of time in the same tests.

Credit: Forbes

WhatsApp’s AI helper gave the wrong number to a user asking for the phone number for a rail company’s helpline. When asked why, the AI tried to change the subject and when that didn’t work it gave various conflicting answers as to why it gave the wrong answer.

Credit: The Guardian

Gartner predicts that more than 40 percent of the Agentic AI projects will be cancelled by the end of 2027 due to rising costs, unclear business value or insufficient risk management. So what about the 60 percent left? Carnegie Mellon says that the successful rate of task completion for multi-step tasks is about 30-35 percent. Credit: The Register

All of this could be viewed as a Debbie-downer on AI, but that is probably too simplistic an answer. We use AI every day in our business and it works pretty well. But, since we are in the AI business we know how to construct the correct prompts and how to set up AI to win. We are also domain experts in the subjects we are asking about. All of this is very important to getting the best from your AI. That being said, the average user just asks a question and expects the correct answer. That is probably where you get in trouble.

Need help with your AI project? Contact our Chief AI Officer, Ray.

Using AI? Trust-No! Verify-Yes!

Leave a Reply Cancel reply