The SpecMD study focuses on the Mixture of Experts (MoE) architecture. Engineers proposed a method of "speculative expert prefetching," which radically reduces latency and inference costs for giant networks. The second paper, MemoryLLM, addresses the interpretability problem: Apple transforms transformer feed-forward layers into understandable, manageable memory (Plug-n-Play Interpretable Feed-Forward Memory). These releases show that Cupertino is not participating in the parameter race just for hype. The company is methodically dissecting algorithms to make them predictable and commercially viable, which is critically important for integrating B2B tools into the real sector.
Source: Apple Machine Learning Research
R&DAppleMoETransformersInference