Google officially releases Gemma 3n mini-steel model: 2GB of RAM to play AI multimodal locally

June 27 - Technology outlet NeoWin published a blog post today (June 27) reporting that after previewing the release at the 2025 I/O developer conference, theGoogleOfficially launched Gemma 3n End-side multimodal modeling,Supports running locally on cell phones, tablets and laptops and handles multiple data types of audio, text, images and video.

Google officially releases Gemma 3n mini-steel model: 2GB of RAM to play AI multimodal locally

Compared to the preview released in May, the latest full release of Gemma 3n further improves performance by supporting local operation on hardware with 2GB of RAM, with a focus on coding and inference.

1AI cites a blog post that describes Gemma 3n as having two scaled versions, E2B with 5 billion (5B) parameters supporting operation on devices with more than 2GB of memory, and E4B with 8 billion (8B) parameters supporting operation on devices with more than 3GB of memory, both of which, through architectural innovations, have memory footprints equivalent to the 2 billion (2B) and 4 billion (4B) models.

In terms of architecture, Gemma 3n innovates with the MatFormer architecture to provide computational flexibility, in addition to the use of Per Layer Embeddings (PLE) for memory efficiency, the MobileNet-v5 visual coder, and more.

For the MatFormer architecture, Google describes it with a Russian nesting doll analogy: a larger model contains a smaller but fully functional version inside, allowing a single model to run at different sizes depending on the task.

Gemma 3n achieves quality improvements in multilingualism (text support in 140 languages and multimodal understanding in 35 languages), math, coding, and reasoning.

In terms of performance benchmarks, the larger E4B model is the first model with less than 10B parameters but an LMArena score of over 1300.

The model's audio capabilities now support speech-to-text and translation on the device, using an encoder capable of handling detailed speech.

The visual side is powered by a new encoder called MobileNet-V5, which is faster and more efficient than its predecessor. It can process video at up to 60FPS on Google Pixel devices.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
HeadlinesInformation

Beijing Multi-Departmental Jointly Launches Special Compliance Guidance for AI Filling the College Entrance Examination Volunteer Function

2025-6-27 11:51:21

Information

Industry's first: Tencent's hybrid-A13B model is released and open-sourced, and can be deployed on one low-end GPU card under extreme conditions.

2025-6-27 21:20:28

Search