Ollama Goes Online with Self-developed Multimodal AI Engine: Gradually Getting Rid of llama.cpp Framework Dependency, Local Inference Performance Soars

Ollama Goes Online with Self-Developed Multimodal AI Engine: Gradually Getting Rid of the llama.cpp Framework Dependency, Local Inference Performance Soars

May 17, 2011 - Technology media outlet WinBuzzer published a blog post yesterday (May 16) reporting that the open source large language modeling service tool Ollama Launch of self-developedMultimodality AI Customize the engine to get rid of the direct dependency on the llama.cpp framework.

Ollama Goes Online with Self-Developed Multimodal AI Engine: Gradually Getting Rid of the llama.cpp Framework Dependency, Local Inference Performance Soars

The llama.cpp project has recently integrated full visual support through the libmtmd library, and Ollama's relationship with it has sparked community discussion.

A member of the Ollama team clarified on Hacker News thatOllama is developed independently using golang, not directly borrowed from it. llama.cpp C++ implementation, and thanks the community for feedback on improving the technology.

In an official statement, Ollama noted that with the increased complexity of models such as Meta's Llama 4, Google's Gemma 3, Alibaba's Qwen 2.5 VL, and Mistral Small 3.1, existing architectures were struggling to meet demand.

That's why Ollama is launching a new engine.Targeting a breakthrough in local inference accuracyThis is especially true when dealing with large images and generating a large number of tokens.

Ollama introduces additional metadata for image processing to optimize batch processing and positional data management to avoid output quality degradation due to image segmentation errors, in addition to KVCache optimization techniques to accelerate transformer model inference.

The new engine also dramatically optimizes memory management with a new image caching feature that ensures images can be reused after processing and not discarded prematurely, and Ollama has teamed up with hardware giants such as NVIDIA, AMD, Qualcomm, Intel, and Microsoft to optimize memory estimation by accurately detecting hardware metadata.

The engine also supports techniques such as chunked attention and 2D rotary embedding for models such as Meta's Llama 4 Scout (a 109 billion parameter mixture of expert models MoE).

Ollama has future plans to support longer context lengths, complex inference processes, and streaming responses to tool calls, further enhancing the versatility of native AI models.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Ollama Goes Online with Self-Developed Multimodal AI Engine: Gradually Getting Rid of the llama.cpp Framework Dependency, Local Inference Performance Soars

Google Android to push new ML Kit GenAI API to extend end-side Gemini Nano AI model access

OpenAI ChatGPT revealed to support MCP protocol to access third-party AI services

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Google Android to push new ML Kit GenAI API to extend end-side Gemini Nano AI model access

OpenAI ChatGPT revealed to support MCP protocol to access third-party AI services

Ollama 0.2 released: Concurrency is enabled by default to handle multiple requests and load multiple models simultaneously

Former Google CEO Schmidt: If AI Starts Improving Itself, We Should 'Seriously Consider' Suspending It

Bloomberg Analyst: Global Banking Industry to Cut Up to 200,000 Jobs in Next Three to Five Years Due to AI

Wei Zhejia: AI becomes TSMC's biggest "pot of gold", driving revenue CAGR of more than 20% in the next five years

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow