Apple releases FastVLM visual language model, paving the way for new smart glasses and other wearables

May 13 News.appleThe Machine Learning team posted on GitHub last week andOpen SourceKnowing that you've got avisual language model ——FastVLMIt is available in 0.5B, 1.5B and 7B versions.

The model was developed based on Apple's own MLX framework and trained with the LLaVA codebase, optimized for end-side AI operations on Apple Silicon devices.

The technical documentation shows that FastVLM achieves near-real-time response for high-resolution image processing while maintaining accuracy and requiring much less computation than comparable models.

At its core is a hybrid vision encoder called FastViTHD. The Apple team says the encoder is "designed for efficient VLM performance on high-resolution images" and offers a 3.2x processing speedup over comparable models at a fraction of the size.

Apple releases FastVLM visual language model, paving the way for new smart glasses and other wearables

  • bright spot
  • FastViTHD's New Hybrid Vision Encoder: Optimized for High-Resolution Images, Reduces Token Output and Significantly Reduces Encoding Time
  • Comparison of the performance of the smallest model version: 85 times faster response to first words (tokens) than the LLaVA-OneVision-0.5B model implementation, 3.4 times smaller visual encoder size
  • With Qwen2-7B Large Language Model version: 7.9x faster first-word meta-response using a single image encoder than recent research such as Cambrian-1-8B.
  • Accompanying iOS demo app: real-world demonstration of model performance on mobile

The Apple technology team noted, "Based on a comprehensive efficiency analysis of image resolution, visual latency, and the number of lexical elements versus LLM size, we developed FastVLM - which achieves an optimal trade-off between latency, model size, and accuracy."

The technology's application scenario points to a smart glasses-type wearable device that Apple is developing. According to multiple sources, Apple is planning to launch AI eyewear to rival the Meta Ray-Bans in 2027, and may release a camera-equipped AirPods device at the same time.

FastVLM's localized processing capabilities effectively support real-time visual interactions on such devices away from the cloud.1AI inquired about the MLX framework, which allows developers to train and run models locally on Apple devices and is compatible with major AI development languages.1The launch of FastVLM confirms that Apple is building a complete end-to-end AI technology ecosystem.

References:

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Zuckerberg: AR glasses will be the future of cell phones, VR the future of TVs

2025-5-13 12:05:51

Information

Volcano Engine Releases Seedance 1.0 lite, a Beanbag Video Generation Model: Movie and TV-Grade Quality, Dramatic Speed Improvements

2025-5-13 16:38:35

Search