~ K D P ~

. . . . . . .

Challenges Facing Modern AI Language Models

image

Modern large AI language models, particularly o3 from OpenAI, are showing more errors compared to their predecessors. This is supported by multiple studies referenced by The New York Times.

Similar difficulties are found in models from other companies, such as Google and the Chinese startup DeepSeek. Despite the increase in their mathematical capabilities, the actual error rates in queries are only rising.

One of the most common issues with artificial intelligence is the phenomenon known as "hallucinations," where models fabricate information and facts without any sources. Amr Awaadalla, CEO of Vectara, notes that these hallucinations will persist regardless of developers' efforts.

An example of such hallucination occurred with the Cursor support bot, which falsely claimed that the tool could only be used on one computer, leading to numerous complaints and account deletions. It later turned out that the company had made no such changes; the bot had made it up.

During separate testing of various models, the rate of hallucinations reached 79%. In internal testing, OpenAI's o3 model generated hallucinations in 33% of responses to questions about famous individuals, double that of o1. The new 04-mini model showed even worse results, making errors in 48% of cases.

When responding to general questions, the hallucination rates for models o3 and o4-mini were even higher—51% and 79% respectively. In comparison, the older o1 model fabricated facts in 44% of cases. OpenAI acknowledges the need for further research to understand the causes of such errors.

Independent tests conducted by various companies and researchers indicate that hallucinations also occur in reasoning-enabled models from Google and DeepSeek. Vectara's research found that such models fabricate facts at least 3% of the time, with some instances reaching 27%. Despite companies' efforts to eliminate these mistakes, the rate of hallucinations has only decreased by 1-2% over the past year.