Baichuan Announces Online Open Source Full Modal Model

On January 26th, Baichuan Big Model announced the launch of Baichuan-Omni-1.5 open source omnimodal model. According to Baichuan's official introduction, Baichuan-Omni-1.5 not only supports full-modal understanding of text, image, audio and video, but also has the ability of bimodal generation of text and audio. Baichuan-Omni-1.5 outperforms GPT-4o mini in vision, speech and multimodal streaming processing, and its lead is even more prominent in multimodal medical applications.Baichuan-Omni-1.5 obtains a large amount of different modal data and comprehensive multimodal interleaved data through a perfect process of data capturing, cleansing and synthesizing, and designs multimodal Baichuan-Omni-1.5 obtains a large amount of different modal data and comprehensive multimodal interleaved data through perfect data capturing, cleaning and synthesis processes, and designs a multi-stage training process, which well completes the multimodal alignment, together with the reasonable model structure optimization, so as to realize the effect that a model achieves the leading effect in the ability of multiple modalities, and solves the problem of "model downgrading" of multimodal models.

Search