{"id":4420,"date":"2024-02-26T09:53:40","date_gmt":"2024-02-26T01:53:40","guid":{"rendered":"https:\/\/www.1ai.net\/?p=4420"},"modified":"2024-02-26T09:53:40","modified_gmt":"2024-02-26T01:53:40","slug":"%e5%87%ba%e9%97%a8%e9%97%ae%e9%97%ae%e5%bc%80%e6%94%be%e5%a4%a7%e6%a8%a1%e5%9e%8b%e5%ba%8f%e5%88%97%e7%8c%b4%e5%ad%90%e5%bc%80%e6%ba%90%e6%95%b0%e6%8d%ae%e9%9b%86","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/4420.html","title":{"rendered":"Mobvoi Opens Up the Large Model &quot;Sequence Monkey&quot; Open Source Dataset"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%87%ba%e9%97%a8%e9%97%ae%e9%97%ae\" title=\"[Sees articles with labels]\" target=\"_blank\" >Mobvoi<\/a>announced that it will open its hyperscale language model to the public\"<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%ba%8f%e5%88%97%e7%8c%b4%e5%ad%90\" title=\"[Sees articles with [serial monkey] labels]\" target=\"_blank\" >serial monkey<\/a>\"The partial training dataset, named \"Sequence Monkey<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Dataset 1.0\".<\/p>\n<p>Sequence Monkey, as one of the core technologies of Going Out, has a powerful generalized representation and inference capability, and has demonstrated its excellent performance in many fields such as Q&amp;A system, natural language processing, machine translation, text summarization, etc., which greatly improves the productivity and data processing capability.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4421\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/02\/6384453393872686908352025.jpg\" alt=\"\" width=\"560\" height=\"798\" \/><\/p>\n<p>In order to promote the continuous progress of large language modeling technology, GoDoQ decided to open source some of its training datasets. The open source \"Sequence Monkey Open Source Dataset 1.0\" includes Chinese general text corpus, ancient poetry and modern translation corpus, and text generation corpus, which have been carefully selected and organized to ensure their high quality and easy-to-use data format. At the same time, the company has adopted a generous license agreement, which provides easy access for developers and researchers.<\/p>\n<p>Through this action, Going Out hopes to attract more talents and teams to participate in the research and application of big language modeling, and jointly promote the continuous progress of this cutting-edge technology. The company firmly believes that the release of the open source dataset will promote academic exchanges and cooperation and accelerate the pace of innovation in related fields.<\/p>\n<p><strong>Project address:<\/strong>https:\/\/github.com\/mobvoi\/seq-monkey-data<\/p>","protected":false},"excerpt":{"rendered":"<p>OutdoorQ announced that it will open part of the training dataset of its super-large-scale language model \"Sequence Monkey\" to the public, named \"Sequence Monkey open source dataset 1.0\". Sequence Monkey, as one of the core technologies of GoDoGo, has a powerful universal representation and reasoning ability, and has shown its excellent performance in many fields such as Q&amp;A system, natural language processing, machine translation, text summarization, etc., which greatly improves the productivity and data processing ability. In order to promote the continuous progress of the large language modeling technology, Door asks the decision to open source part of its training dataset. The open source \"Sequence Monkey Open Source Dataset 1.0\" includes Chinese general text corpus, ancient poetry and modern translation corpus, and text generation corpus, which have been carefully selected and organized to ensure its high quality and easy-to-use data grid.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1360,216,1361,219],"collection":[],"class_list":["post-4420","post","type-post","status-publish","format-standard","hentry","category-news","tag-1360","tag-216","tag-1361","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/4420","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=4420"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/4420\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=4420"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=4420"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=4420"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=4420"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}