The world’s best AI models fail 1 in 3 tasks. Here’s what the benchmarks really show about GPT-5, Claude Opus 4, and Gemini 2.5 Flash.