{"id":26899,"date":"2025-01-15T21:02:07","date_gmt":"2025-01-15T13:02:07","guid":{"rendered":"https:\/\/www.1ai.net\/?p=26899"},"modified":"2025-01-15T21:02:07","modified_gmt":"2025-01-15T13:02:07","slug":"kimi-%e5%a4%9a%e6%a8%a1%e6%80%81%e5%9b%be%e7%89%87%e7%90%86%e8%a7%a3%e6%a8%a1%e5%9e%8b-api-%e5%8f%91%e5%b8%83%ef%bc%8c1m-tokens-%e5%ae%9a%e4%bb%b7-12-%e5%85%83%e8%b5%b7","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/26899.html","title":{"rendered":"Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12"},"content":{"rendered":"<p>January 15, 2011 - Dark Side of the Moon today released the\u00a0<a href=\"https:\/\/www.1ai.net\/en\/tag\/kimi\" title=\"[View articles tagged with [Kimi]]\" target=\"_blank\" >Kimi<\/a> <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81\" title=\"[View articles tagged with [multimodal]]\" target=\"_blank\" >Multimodality<\/a>Image Understanding Model <a href=\"https:\/\/www.1ai.net\/en\/tag\/api\" title=\"_OTHER ORGANISER\" target=\"_blank\" >API<\/a>The new multimodal picture comprehension model\u00a0<strong>moonshot-v1-vision-preview<\/strong>(hereinafter referred to as \"Vision model\") completes the multimodal capabilities of the moonshot-v1 model family.<\/p>\n<p><strong>Description of model capabilities<\/strong><\/p>\n<p>Image Recognition<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-26902\" title=\"cb180bdcj00sq4s2m008id000u000nhp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/cb180bdcj00sq4s2m008id000u000nhp.jpg\" alt=\"cb180bdcj00sq4s2m008id000u000nhp\" width=\"1080\" height=\"845\" \/><\/p>\n<p>Vision models are equipped with image recognition capabilities, recognizing complex details and nuances in images, be it food or animals, and being able to distinguish between similar but not identical objects.<\/p>\n<p>In the example below, 16 similar images of blueberry muffins and chihuahuas that are harder for the human eye to distinguish have been officially pieced together, with the Vision model recognizing and labeling the image types in order.<strong>Whether it's a blueberry muffin or a Chihuahua, the model can accurately differentiate and identify the<\/strong>.<\/p>\n<p>Text recognition and comprehension<\/p>\n<p>Vision models have advanced image recognition capabilities that are more accurate than ordinary document scanning and OCR recognition software in OCR text recognition and image understanding scenarios.<strong>Handwritten scribbles such as receipts \/ courier bills can be accurately recognized.<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-26901\" title=\"022554a2j00sq4s3v005jd000u000iop\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/022554a2j00sq4s3v005jd000u000iop.jpg\" alt=\"022554a2j00sq4s3v005jd000u000iop\" width=\"1080\" height=\"672\" \/><\/p>\n<p>Taking this bar chart of \"A student's final exam results\" as an example, the official asked the model to extract and analyze the exam results and analyze the bar chart from the perspective of aesthetic style. The Vision model is also able to accurately identify the score values corresponding to each subject name in the bar chart and do a comparison of the scores, and at the same time, it can identify the style formatting and color of the bar chart.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-26900\" title=\"abca3503j00sq4s420061d000u000l4p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/abca3503j00sq4s420061d000u000l4p.jpg\" alt=\"abca3503j00sq4s420061d000u000l4p\" width=\"1080\" height=\"760\" \/><\/p>\n<p>model billing<\/p>\n<p data-vmark=\"77da\"><strong>Vision models are billed on a per-volume basis<\/strong>The price of the model call varies according to the model selected, with the following distinctions:<\/p>\n<table>\n<tbody>\n<tr class=\"firstRow\">\n<td width=\"295\">Model<\/td>\n<td width=\"90\">Billing Unit<\/td>\n<td width=\"134\">price<\/td>\n<\/tr>\n<tr>\n<td width=\"247\"><span class=\"link-text-start-with-http\">moonshot-v1-8k-vision-preview<\/span><\/td>\n<td width=\"69\">1M tokens<\/td>\n<td width=\"74\">\u00a512.00<\/td>\n<\/tr>\n<tr>\n<td width=\"305\"><span class=\"link-text-start-with-http\">moonshot-v1-32k-vision-preview<\/span><\/td>\n<td width=\"69\">1M tokens<\/td>\n<td width=\"70\">\u00a524.00<\/td>\n<\/tr>\n<tr>\n<td width=\"305\"><span class=\"link-text-start-with-http\">moonshot-v1-128k-vision-preview<\/span><\/td>\n<td width=\"69\">1M tokens<\/td>\n<td width=\"70\">\u00a560.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"6hgmv-0-0\">\n<div data-offset-key=\"6hgmv-0-0\"><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"6hgmv-0-0\"><strong>Description of model constraints<\/strong><\/div>\n<div data-offset-key=\"6hgmv-0-0\"><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"6hgmv-0-0\">Features supported by the Vision visual model include:<\/div>\n<\/div>\n<ul data-offset-key=\"feujg-0-0\">\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-reset public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"feujg-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"feujg-0-0\"><span data-offset-key=\"feujg-0-0\">many rounds of dialogue<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"spbj-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"spbj-0-0\"><span data-offset-key=\"spbj-0-0\">streaming output<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"9nj4i-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"9nj4i-0-0\"><span data-offset-key=\"9nj4i-0-0\">Tool Call<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"78hi5-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"78hi5-0-0\"><span data-offset-key=\"78hi5-0-0\">JSON Mode<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"cv6ms-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"cv6ms-0-0\"><span data-offset-key=\"cv6ms-0-0\">Partial Mode<\/span><\/div>\n<\/li>\n<\/ul>\n<div data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"an7sq-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"an7sq-0-0\"><span data-offset-key=\"an7sq-0-0\">The following features are not supported or partially supported at this time:<\/span><\/div>\n<\/div>\n<ul data-offset-key=\"c52a1-0-0\">\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-reset public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"c52a1-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"c52a1-0-0\"><span data-offset-key=\"c52a1-0-0\">Internet search: not supported<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"8pq6q-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"8pq6q-0-0\"><span data-offset-key=\"8pq6q-0-0\">Context Caching:<\/span><span data-offset-key=\"8pq6q-0-1\">Creating a Context Cache with image content is not supported.<\/span><span data-offset-key=\"8pq6q-0-2\">The Vision model can be called with a Cache that has already been created.<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"7kjv2-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"7kjv2-0-0\"><span data-offset-key=\"7kjv2-0-0\">URL-formatted images: not supported, currently only base64-encoded image content is supported<\/span><\/div>\n<\/li>\n<\/ul>\n<div data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"asfai-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"asfai-0-0\"><span data-offset-key=\"asfai-0-0\">Other Platform Updates<\/span><\/div>\n<\/div>\n<ul data-offset-key=\"2u62d-0-0\">\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-reset public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"2u62d-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"2u62d-0-0\"><span data-offset-key=\"2u62d-0-0\">Support for organizational project management functions<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"6gi57-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"6gi57-0-0\"><span data-offset-key=\"6gi57-0-0\">Support for one business entity to authenticate multiple accounts<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"adg4n-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"adg4n-0-0\"><span data-offset-key=\"adg4n-0-0\">Add File file resource management function: intuitively manage and view file resources.<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"ao59s-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ao59s-0-0\"><span data-offset-key=\"ao59s-0-0\">Optimize mouse hover copy for resource management list<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"ekotd-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"ekotd-0-0\"><span data-offset-key=\"ekotd-0-0\">Context Caching has been released to full users.<\/span><\/div>\n<\/li>\n<li class=\"public-DraftStyleDefault-unorderedListItem public-DraftStyleDefault-depth0 public-DraftStyleDefault-listLTR\" data-block=\"true\" data-editor=\"wyh\" data-offset-key=\"3bcve-0-0\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"3bcve-0-0\"><span data-offset-key=\"3bcve-0-0\">Cache renewals are no longer charged for creation<\/span><\/div>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>January 15, 2011 - The Dark Side of the Moon today released the Kimi Multimodal Image Understanding Model API, a new multimodal image understanding model, moonshot-v1-vision-preview (hereinafter referred to as the \"Vision model\"), which completes the moonshot-v1 model series' multimodal capabilities. model series. Model Capability Description Image Recognition The Vision model has the ability to recognize complex details and nuances in an image, whether it is a food or an animal, and to distinguish between objects that are similar but not identical. In the example below, 16 similar images of a blueberry muffin and a Chihuahua, which are difficult for the human eye to distinguish, have been assembled and recognized by the Vision model and labeled with the image classes in order.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1033,1814,592],"collection":[],"class_list":["post-26899","post","type-post","status-publish","format-standard","hentry","category-news","tag-api","tag-kimi","tag-592"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/26899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=26899"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/26899\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=26899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=26899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=26899"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=26899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}