Google releases and open sources new end-side multimodal macromodel Gemma 3n

Google released the open source multimodal model Gemma 3n, with two specifications, E2B and E4B, requiring only 2GB/3GB of RAM to run, supporting image, audio, video and text inputs; the core innovations lie in the MatFormer architecture (a Russian nesting doll design), Per Layer Embedding (PLE) technology, and KV Cache Sharing, which realizes the model's small size and strong performance; the model Equipped with a new audio encoder and MobileNet-V5 visual encoder, the E4B version becomes the first sub-10 billion parameter model with an LMArena score of over 1300.

Search