{"id":27888,"date":"2025-01-28T08:23:15","date_gmt":"2025-01-28T00:23:15","guid":{"rendered":"https:\/\/www.1ai.net\/?p=27888"},"modified":"2025-01-28T08:23:15","modified_gmt":"2025-01-28T00:23:15","slug":"deepseek-%e6%b7%b1%e5%a4%9c%e5%86%8d%e6%94%be%e5%a4%a7%e6%8b%9b%ef%bc%9a7b-%e5%8f%82%e6%95%b0%e4%ba%ba%e4%ba%ba%e5%8f%af%e7%94%a8%e7%9a%84%e8%a7%86%e8%a7%89%e5%a4%9a%e6%a8%a1%e6%80%81%e6%a8%a1","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/27888.html","title":{"rendered":"DeepSeek's Late-Night Amplification: 7B Parameters for Everyone's Visual Multimodal Model Janus-Pro-7B Open Source"},"content":{"rendered":"<p>January 28, 2011 - Just this morning, Beijing time, the<a href=\"https:\/\/www.1ai.net\/en\/tag\/deepseek\" title=\"[View articles tagged with [DeepSeek]]\" target=\"_blank\" >DeepSeek<\/a> Announce<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>brand new<strong><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e8%a7%89%e5%a4%9a%e6%a8%a1%e6%80%81%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [visual polymode] labels]\" target=\"_blank\" >visual multimodal model<\/a> Janus-Pro-7B<\/strong>which beat Stable Diffusion and OpenAI's DALL-E 3 in the GenEval and DPG-Bench benchmarks.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-27889\" title=\"2eb36e88j00sqrvlv002md000pg00bbp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/2eb36e88j00sqrvlv002md000pg00bbp.jpg\" alt=\"2eb36e88j00sqrvlv002md000pg00bbp\" width=\"916\" height=\"407\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-27890\" title=\"df66a2faj00sqrvma00fpd000pf00fop\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/df66a2faj00sqrvma00fpd000pf00fop.jpg\" alt=\"df66a2faj00sqrvma00fpd000pf00fop\" width=\"915\" height=\"564\" \/><\/p>\n<p><strong>1AI Attachment Address:<\/strong><\/p>\n<ul>\n<li>GitHub:<a href=\"https:\/\/github.com\/deepseek-ai\/Janus\">Click here to go<\/a><\/li>\n<li>HuggingFace:<a href=\"https:\/\/huggingface.co\/deepseek-ai\/Janus-Pro-7B\">Click here to go<\/a><\/li>\n<\/ul>\n<p>The official description of the model is to the following effect:<\/p>\n<blockquote>\n<ul>\n<li>Janus-Pro is an innovative autoregressive framework for unified understanding and generation of multimodal information. In contrast to previous approaches, Janus-Pro is implemented through the<strong>Splitting the visual coding process into multiple independent paths<\/strong>, addressing some of the limitations of previous frameworks while still using a single unified converter architecture for processing.<\/li>\n<li>This decoupling not only effectively mitigates possible conflicts in the comprehension and generation process of the visual coder, but also enhances the flexibility of the framework.<\/li>\n<li>Janus outperforms traditional unified models and performs equally well in comparisons with task-specific models. With its simplicity, high flexibility, and efficiency, Janus-Pro is a strong contender for the next generation of unified multimodal models.<\/li>\n<\/ul>\n<\/blockquote>\n<p>The gist of the summary is as follows:<\/p>\n<blockquote>\n<ul>\n<li>Janus-Pro is a unified Multimodal Large Language Model (MLLM) that enables more efficient processing by decoupling the visual encoding process from multimodal comprehension and generation.Janus-Pro is built on the DeepSeek-LLM-1.5b-base\/DeepSeek-LLM-7b-base model.<\/li>\n<li>In the multimodal understanding task, Janus-Pro uses SigLIP-L as a visual encoder.<strong>Supports 384 x 384 pixel image inputs<\/strong>. While in the image generation task, Janus-Pro uses a disambiguator from a specific source with a downsampling rate of 16.<\/li>\n<\/ul>\n<\/blockquote>\n<p>Janus-Pro is an advanced version of the previous work Janus. Specifically, Janus-Pro integrates an optimized training strategy, extended training data, and larger model scale extensions. With these improvements, Janus-Pro makes significant advances in multimodal understanding and text-to-image instruction adherence capabilities, while also enhancing the stability of text-to-image generation.<\/p>\n<p>According to the official description, JanusFlow introduces a minimalist architecture that will<strong>Autoregressive language modeling with correction flow<\/strong>(a state-of-the-art generative modeling approach) is integrated. It was found that calibration flows can be trained directly within large language modeling frameworks without complex architectural adjustments. Numerous experiments have shown that JanusFlow has achieved in its respective domains<strong>Comparable to or better than specialized models<\/strong>performance while significantly outperforming existing unified methods in standard benchmarks. This work represents a step towards more efficient and generalized visual language models.<\/p>","protected":false},"excerpt":{"rendered":"<p>Jan. 28, 2011 - DeepSeek announced that it has open-sourced a new visual multimodal model, Janus-Pro-7B, which beat Stable Diffusion and OpenAI's DALL-E 3 in the GenEval and DPG-Bench benchmarks. 1AI is attached: GitHub: click here to go to HuggingFace: go here The official description of the model is along the following lines: Janus-Pro is an innovative autoregressive framework for unified understanding and generation of multimodal information. Unlike previous approaches, Janus-Pro solves the problem by splitting the visual encoding process into multiple independent paths.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3606,219,5650],"collection":[],"class_list":["post-27888","post","type-post","status-publish","format-standard","hentry","category-news","tag-deepseek","tag-219","tag-5650"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/27888","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=27888"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/27888\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=27888"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=27888"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=27888"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=27888"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}