Apple releases FastVLM visual language model, paving the way for new smart glasses and other wearables

May 13 News.appleThe Machine Learning team posted on GitHub last week andOpen SourceKnowing that you've got avisual language model ——FastVLMIt is available in 0.5B, 1.5B and 7B versions.

The model was developed based on Apple's own MLX framework and trained with the LLaVA codebase, optimized for end-side AI operations on Apple Silicon devices.

The technical documentation shows that FastVLM achieves near-real-time response for high-resolution image processing while maintaining accuracy and requiring much less computation than comparable models.

At its core is a hybrid vision encoder called FastViTHD. The Apple team says the encoder is "designed for efficient VLM performance on high-resolution images" and offers a 3.2x processing speedup over comparable models at a fraction of the size.

Apple releases FastVLM visual language model, paving the way for new smart glasses and other wearables

bright spot

FastViTHD's New Hybrid Vision Encoder: Optimized for High-Resolution Images, Reduces Token Output and Significantly Reduces Encoding Time
Comparison of the performance of the smallest model version: 85 times faster response to first words (tokens) than the LLaVA-OneVision-0.5B model implementation, 3.4 times smaller visual encoder size
With Qwen2-7B Large Language Model version: 7.9x faster first-word meta-response using a single image encoder than recent research such as Cambrian-1-8B.
Accompanying iOS demo app: real-world demonstration of model performance on mobile

The Apple technology team noted, "Based on a comprehensive efficiency analysis of image resolution, visual latency, and the number of lexical elements versus LLM size, we developed FastVLM - which achieves an optimal trade-off between latency, model size, and accuracy."

The technology's application scenario points to a smart glasses-type wearable device that Apple is developing. According to multiple sources, Apple is planning to launch AI eyewear to rival the Meta Ray-Bans in 2027, and may release a camera-equipped AirPods device at the same time.

FastVLM's localized processing capabilities effectively support real-time visual interactions on such devices away from the cloud.1AI inquired about the MLX framework, which allows developers to train and run models locally on Apple devices and is compatible with major AI development languages.1The launch of FastVLM confirms that Apple is building a complete end-to-end AI technology ecosystem.

References:

https://github.com/apple/ml-fastvlm?tab=readme-ov-file
[2412.13303] FastVLM: Efficient Vision Encoding for Vision Language Models

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Apple releases FastVLM visual language model, paving the way for new smart glasses and other wearables

Zuckerberg: AR glasses will be the future of cell phones, VR the future of TVs

Volcano Engine Releases Seedance 1.0 lite, a Beanbag Video Generation Model: Movie and TV-Grade Quality, Dramatic Speed Improvements

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Zuckerberg: AR glasses will be the future of cell phones, VR the future of TVs

Volcano Engine Releases Seedance 1.0 lite, a Beanbag Video Generation Model: Movie and TV-Grade Quality, Dramatic Speed Improvements

Apple AI releases 700 million parameter open source language model DCLM with improved accuracy and reduced computing resource usage

Hugging Face Releases SmolVLM Open Source AI Model: 2 Billion Parameters for End-Side Reasoning, Small and Fast

Say Goodbye to Silent Movies: Smart Spectrum Launches New Clear Shadow, Generating 10-Second 4K60 Frame/Self-Audio Videos

Sand AI Releases Open Source Video Generation Model MAGI-1, Tsinghua Special Prize Winner Team's Video Generation AI Brushes the Screen Overnight

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow