Transformer Anatomy: Apple Hacks the "Black Box" and Cheapens MoE Inference

Published on: 05.07.2026 01:10

The blind scaling of parameters is giving way to architectural optimization. On July 4, 2026, at the ICML conference, Apple presented two fundamental papers: `SpecMD` and `MemoryLLM`, aimed at the transparency and profitability of AI models.

The SpecMD study focuses on the Mixture of Experts (MoE) architecture. Engineers proposed a method of "speculative expert prefetching," which radically reduces latency and inference costs for giant networks. The second paper, MemoryLLM, addresses the interpretability problem: Apple transforms transformer feed-forward layers into understandable, manageable memory (Plug-n-Play Interpretable Feed-Forward Memory). These releases show that Cupertino is not participating in the parameter race just for hype. The company is methodically dissecting algorithms to make them predictable and commercially viable, which is critically important for integrating B2B tools into the real sector.

Source: Apple Machine Learning Research

R&DAppleMoETransformersInference

« Back to News List