Microsoft Phi-4 multimodal and mini-model online, speech vision text all-around

Microsoft Phi-4 Multi-Modal and Mini-Models Online, Speech Vision Text All-in-One

February 27th.MicrosoftReleased in December 2024 Phi-4Phi-4 is a small language model (SLM) that is a top performer in its class. Today, Microsoft is further expanding the Phi-4 family byTwo new models were introduced: Phi-4 Multimodality(Phi-4-multimodal) and Phi-4-mini.

Microsoft Phi-4 Multi-Modal and Mini-Models Online, Speech Vision Text All-in-One

Phi-4 Multimodal Model is Microsoft's first unified architecture multimodal language model that integrates speech, vision and text processingThe Phi-4 multimodal model has 5.6 billion parameters. In several benchmarks, Phi-4 multimodal outperforms other existing state-of-the-art all-modal models, such as Google's Gemini 2.0 Flash and Gemini 2.0 Flash Lite.

In speech-related tasks, Phi-4 Multimodal outperformed specialized speech models such as WhisperV3 and SeamlessM4T-v2-Large in automatic speech recognition (ASR) and speech translation (ST). The model topped the Hugging Face OpenASR charts with a word error rate of 6.14%, Microsoft said.

In vision-related tasks, Phi-4 multimodal excels in mathematical and scientific reasoning. The model matches or even surpasses popular models such as Gemini-2-Flash-lite-preview and Claude-3.5-Sonnet in common multimodal capabilities such as document comprehension, diagram comprehension, Optical Character Recognition (OCR), and visual scientific reasoning.

1AI notes thatThe Phi-4 mini-model, on the other hand, focuses on textual tasks.The number of references is 3.8 billion. It outperforms several popular large language models in tasks such as textual reasoning, mathematical computation, programming, instruction following, and function calling.

To ensure the security and reliability of the new models, Microsoft invited internal and external security experts to conduct tests and adopted policies developed by the Microsoft Artificial Intelligence Red Team (AIRT). After further optimization, both Phi-4 Mini and Phi-4 Multi-Modal models can be deployed to the device side via ONNX Runtime, enabling cross-platform use for low-cost and low-latency scenarios.

Phi-4 multimodal and Phi-4 mini-models are now live for developers in the Azure AI Foundry, Hugging Face and NVIDIA API catalogs.

The introduction of the new Phi-4 series of models marks a significant advancement in efficient AI technology, bringing powerful multimodal and text processing capabilities to all types of AI applications.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Microsoft Phi-4 Multi-Modal and Mini-Models Online, Speech Vision Text All-in-One

Multiple Teams Power AI Essay Anti-Identification Detection, Fudan and Other Universities Strictly Regulate the Use of AI in Essays

Professional Go player caught cheating with AI by hiding his cell phone in a tournament was revoked by the China Weiqi Association and banned for 8 years.

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Multiple Teams Power AI Essay Anti-Identification Detection, Fudan and Other Universities Strictly Regulate the Use of AI in Essays

Professional Go player caught cheating with AI by hiding his cell phone in a tournament was revoked by the China Weiqi Association and banned for 8 years.

Microsoft open source multimodal AI Agent "Magma": shopping can automatically order, but also predict the behavior of video characters

Microsoft GitHub Copilot Enterprise Edition is now available for $39 per person per month

Microsoft Copilot web version tests "Phone" AI plug-in: can check contacts, write messages, set alarms, etc.

Building a Diverse AI Data Ecosystem: Microsoft Partners with Harvard, OpenAI, and Other Organizations to Eliminate AI Bias

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow