Microsoft Phi-4 Multi-Modal and Mini-Models Online, Speech Vision Text All-in-One

February 27th.MicrosoftReleased in December 2024 Phi-4Phi-4 is a small language model (SLM) that is a top performer in its class. Today, Microsoft is further expanding the Phi-4 family byTwo new models were introduced: Phi-4 Multimodality(Phi-4-multimodal) and Phi-4-mini.

Microsoft Phi-4 Multi-Modal and Mini-Models Online, Speech Vision Text All-in-One

Phi-4 Multimodal Model is Microsoft's first unified architecture multimodal language model that integrates speech, vision and text processingThe Phi-4 multimodal model has 5.6 billion parameters. In several benchmarks, Phi-4 multimodal outperforms other existing state-of-the-art all-modal models, such as Google's Gemini 2.0 Flash and Gemini 2.0 Flash Lite.

In speech-related tasks, Phi-4 Multimodal outperformed specialized speech models such as WhisperV3 and SeamlessM4T-v2-Large in automatic speech recognition (ASR) and speech translation (ST). The model topped the Hugging Face OpenASR charts with a word error rate of 6.14%, Microsoft said.

In vision-related tasks, Phi-4 multimodal excels in mathematical and scientific reasoning. The model matches or even surpasses popular models such as Gemini-2-Flash-lite-preview and Claude-3.5-Sonnet in common multimodal capabilities such as document comprehension, diagram comprehension, Optical Character Recognition (OCR), and visual scientific reasoning.

1AI notes thatThe Phi-4 mini-model, on the other hand, focuses on textual tasks.The number of references is 3.8 billion. It outperforms several popular large language models in tasks such as textual reasoning, mathematical computation, programming, instruction following, and function calling.

To ensure the security and reliability of the new models, Microsoft invited internal and external security experts to conduct tests and adopted policies developed by the Microsoft Artificial Intelligence Red Team (AIRT). After further optimization, both Phi-4 Mini and Phi-4 Multi-Modal models can be deployed to the device side via ONNX Runtime, enabling cross-platform use for low-cost and low-latency scenarios.

Phi-4 multimodal and Phi-4 mini-models are now live for developers in the Azure AI Foundry, Hugging Face and NVIDIA API catalogs.

The introduction of the new Phi-4 series of models marks a significant advancement in efficient AI technology, bringing powerful multimodal and text processing capabilities to all types of AI applications.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
HeadlinesInformation

Multiple Teams Power AI Essay Anti-Identification Detection, Fudan and Other Universities Strictly Regulate the Use of AI in Essays

2025-2-27 9:42:33

HeadlinesInformation

Professional Go player caught cheating with AI by hiding his cell phone in a tournament was revoked by the China Weiqi Association and banned for 8 years.

2025-2-27 9:46:16

Search