Wall of Logic: LLMs Massively Fail Humanity’s Last Exam Benchmark

Published on: 15.03.2026 00:00

The industry has reached the limit of current architectures. On March 14, 2026, an analysis of the results from the new, rigorous "Humanity's Last Exam" (HLE) benchmark, created by the Center for AI Safety and Scale AI consortium, was published.

The test consists of 2,500 unique expert questions that cannot be solved by simple pattern matching in datasets. The results are sobering: top language models fail when confronted with complex, multi-step reasoning. This confirms the theory that synthetic data and extensive parameter scaling no longer provide an exponential growth in "intelligence." Without a radical change in architecture (shifting from token prediction to genuine logical inference), AGI will remain an unattainable marketing myth.

Source: ScienceDaily / Scale AI

ScienceBenchmarkLLMAGIScale AI

« Back to News List