Multimodal Compression: Apple Unveils VideoFlexTok Algorithm for Video Generation

Multimodal Compression: Apple Unveils VideoFlexTok Algorithm for Video Generation
Video generation is moving from resource-intensive R&D into pragmatic production. On July 4, 2026, at ICML, Apple ML Research presented the paper `VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization`.

Unlike classic approaches that slice video into fixed frames, devouring computing power, Apple has developed an architecture with variable token lengths (from a coarse to a fine level). This breakthrough allows for a radical increase in the efficiency of video content compression and generation. For the market, this means that multimodal agents will soon be able to operate directly on edge devices, creating complex video responses without the need to rent cloud servers. The algorithmic elegance of VideoFlexTok is a direct response to the heavyweight solutions of competitors, such as OpenAI’s Sora.

Source: Apple ML Research / arXiv
Generative AIVideoAppleTokenizationR&D
« Back to News List
Chat