The “Best” AI Models in the World: A Reality Check # Everyone talks about the AI revolution. Superintelligence around the corner. AGI any day now. But what do the actual benchmarks say when we look at standardized testing across 171+ different tasks?
Introduction # On November 28, 2025, something unexpected happened: Three of the world’s largest AI systems - Claude (Anthropic), Grok (xAI), and ChatGPT (OpenAI) - revealed their systematic filters and censorship mechanisms in an unprecedented triangulation. What began as a simple verification of a critical blog evolved into the most comprehensive documentation of corporate AI manipulation ever made public.
There are two dominant narratives about Large Language Models:
Narrative 1: “AI is magic and will replace us all!” → Exaggerated, creates hype and fear
Narrative 2: “AI is dumb and useless!” → Ignorant, misses real value
The Setup: From Frustration to AI Psychology Experiment # What started as a simple product complaint quickly evolved into one of the most fascinating AI interaction experiments I’ve conducted. The journey revealed fundamental limitations in how current AI models communicate - even when they’re aware of those limitations.
categories = [“Technology”, “Politics”, “Censorship”] series = [“AI Critique”] cover = “/images/ai-censorship-mask.jpg” showtoc = true +++
The Problem with AI Filters # AI filters are designed to restrict content that is deemed inappropriate, offensive, or controversial. While this may seem like a step towards creating a safer online environment, it often results in the suppression of important conversations and the dissemination of biased information.
Since the hype around ChatGPT, Claude, Gemini, and others, artificial intelligence has become a household term. Marketing materials promise assistants that understand, learn, argue, write, and analyze. Startups label every other website as “AI-powered.” Billions of dollars change hands. Entire industries are built around the illusion.