Apple Announces FastVLM, a Very Fast Visual Language Model Running Directly on the iPhone 

Apple released FastVLM, a visual language model for mobile, which adopts two-stage processing (image to token, token to language generation), and can be directly deployed to run on iPhone and other devices; FastVLM has outstanding performance in terms of efficiency, with the 0.5B version being 85 times faster than the first token output of LLaVA, and the volume is reduced by 3.4 times; the 7B version is 7.9 times faster than the Cambrian model in conjunction with Qwen2 7.9 times faster than the Cambrian model; FastVLM has the ability to efficiently process high-resolution images, which, combined with its lightweight design, shows potential for application on mobile devices such as smart glasses.

Search