{"id":35435,"date":"2025-05-17T20:37:26","date_gmt":"2025-05-17T12:37:26","guid":{"rendered":"https:\/\/www.1ai.net\/?p=35435"},"modified":"2025-05-17T20:37:26","modified_gmt":"2025-05-17T12:37:26","slug":"ollama-%e4%b8%8a%e7%ba%bf%e8%87%aa%e7%a0%94%e5%a4%9a%e6%a8%a1%e6%80%81-ai-%e5%bc%95%e6%93%8e%ef%bc%9a%e9%80%90%e6%ad%a5%e6%91%86%e8%84%b1-llama-cpp-%e6%a1%86%e6%9e%b6%e4%be%9d%e8%b5%96%ef%bc%8c","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/35435.html","title":{"rendered":"Ollama Goes Online with Self-Developed Multimodal AI Engine: Gradually Getting Rid of the llama.cpp Framework Dependency, Local Inference Performance Soars"},"content":{"rendered":"<p>May 17, 2011 - Technology media outlet WinBuzzer published a blog post yesterday (May 16) reporting that the open source large language modeling service tool <a href=\"https:\/\/www.1ai.net\/en\/tag\/ollama\" title=\"_Other Organiser\" target=\"_blank\" >Ollama<\/a> Launch of self-developed<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81\" title=\"[View articles tagged with [multimodal]]\" target=\"_blank\" >Multimodality<\/a> <a href=\"https:\/\/www.1ai.net\/en\/tag\/ai\" title=\"[View articles tagged with [AI]]\" target=\"_blank\" >AI<\/a> Customize the engine to get rid of the direct dependency on the llama.cpp framework.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-35436\" title=\"949cde18j00sweodc0032d000p000e2p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/05\/949cde18j00sweodc0032d000p000e2p.jpg\" alt=\"949cde18j00sweodc0032d000p000e2p\" width=\"900\" height=\"506\" \/><\/p>\n<p>The llama.cpp project has recently integrated full visual support through the libmtmd library, and Ollama's relationship with it has sparked community discussion.<\/p>\n<p>A member of the Ollama team clarified on Hacker News that<strong>Ollama is developed independently using golang, not directly borrowed from it.<\/strong><strong>\u00a0llama.cpp\u00a0<\/strong><strong>C++ implementation, and thanks the community for feedback on improving the technology.<\/strong><\/p>\n<p>In an official statement, Ollama noted that with the increased complexity of models such as Meta's Llama 4, Google's Gemma 3, Alibaba's Qwen 2.5 VL, and Mistral Small 3.1, existing architectures were struggling to meet demand.<\/p>\n<p>That's why Ollama is launching a new engine.<strong>Targeting a breakthrough in local inference accuracy<\/strong>This is especially true when dealing with large images and generating a large number of tokens.<\/p>\n<p>Ollama introduces additional metadata for image processing to optimize batch processing and positional data management to avoid output quality degradation due to image segmentation errors, in addition to KVCache optimization techniques to accelerate transformer model inference.<\/p>\n<p>The new engine also dramatically optimizes memory management with a new image caching feature that ensures images can be reused after processing and not discarded prematurely, and Ollama has teamed up with hardware giants such as NVIDIA, AMD, Qualcomm, Intel, and Microsoft to optimize memory estimation by accurately detecting hardware metadata.<\/p>\n<p>The engine also supports techniques such as chunked attention and 2D rotary embedding for models such as Meta's Llama 4 Scout (a 109 billion parameter mixture of expert models MoE).<\/p>\n<p>Ollama has future plans to support longer context lengths, complex inference processes, and streaming responses to tool calls, further enhancing the versatility of native AI models.<\/p>","protected":false},"excerpt":{"rendered":"<p>May 17, 2011 - Tech media outlet WinBuzzer published a blog post yesterday (May 16) reporting that Ollama, an open source large language modeling service, has launched a self-developed multimodal AI customization engine, moving away from its direct reliance on the llama.cpp framework. The llama.cpp project recently integrated full visual support through the libmtmd library, and Ollama's relationship with it has sparked community discussion. Ollama team members clarified on Hacker News that Ollama was developed independently using golang, does not directly borrow from the C++ implementation of llama.cpp, and appreciates community feedback to improve the technology. Ollama<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[411,2405,592],"collection":[],"class_list":["post-35435","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-ollama","tag-592"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/35435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=35435"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/35435\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=35435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=35435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=35435"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=35435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}