{"id":14491,"date":"2024-07-02T09:13:19","date_gmt":"2024-07-02T01:13:19","guid":{"rendered":"https:\/\/www.1ai.net\/?p=14491"},"modified":"2024-07-02T09:13:19","modified_gmt":"2024-07-02T01:13:19","slug":"%e6%9c%88%e4%b9%8b%e6%9a%97%e9%9d%a2-kimi-%e5%bc%80%e6%94%be%e5%b9%b3%e5%8f%b0%e3%80%8c%e4%b8%8a%e4%b8%8b%e6%96%87%e7%bc%93%e5%ad%98%e3%80%8d%e6%ad%a3%e5%bc%8f%e5%85%ac%e6%b5%8b-%e9%95%bf%e6%96%87","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/14491.html","title":{"rendered":"Dark Side of the Moon Kimi Open Platform &quot;Context Cache&quot; Officially Public Beta Long Text Model Cost Reduction 90%"},"content":{"rendered":"<p>Yesterday,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%9c%88%e4%b9%8b%e6%9a%97%e9%9d%a2\" title=\"[Sees articles with labels]\" target=\"_blank\" >Dark Side of the Moon<\/a>Under<a href=\"https:\/\/www.1ai.net\/en\/tag\/kimi\" title=\"[View articles tagged with [Kimi]]\" target=\"_blank\" >Kimi<\/a> The open platform announced that Context Caching has entered public beta. This technology can reduce the cost of using long text flagship models of up to 90% for developers without changing the API price, and significantly improve the response speed of the model.<\/p>\n<p>Context Caching is an efficient data management technology that allows the system to pre-store large amounts of data or information that may be frequently requested. In this way, when you request the same information again, the system can quickly provide it directly from the cache without recalculating or retrieving it from the original data source, saving time and resources. Context Caching is particularly suitable for scenarios with frequent requests and repeated references to a large amount of initial context, which can significantly reduce the cost of long text models and improve efficiency!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14492\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/6385550493248278035629434.jpg\" alt=\"\" width=\"1000\" height=\"561\" \/><\/p>\n<p>Specifically, &quot;context caching&quot; can be applied to scenarios with frequent requests and repeated references to a large number of initial contexts, bringing the following two effects:<\/p>\n<blockquote><p>Cost reduction of up to 90%: For example, in scenarios where a large number of questions need to be asked about fixed documents, context caching can save a lot of costs. For example, for a hardware product manual of about 90,000 words, pre-sales support personnel need to conduct multiple intensive questions and answers in a short period of time. After access to context caching, the cost can be reduced to about 10%.<\/p>\n<p>The first token delay is reduced by 83%: For a request of a 128k model, it usually takes 30 seconds to return the first token. Through context caching, the first token delay can be reduced to 5 seconds on average, reducing the delay time by about 83%.<\/p><\/blockquote>\n<p>The charging model of Context Caching is mainly divided into the following three parts:<\/p>\n<blockquote><p><strong>Cache creation cost:<\/strong><\/p>\n<p>Call the Cache creation API. After successfully creating the Cache, the actual amount of tokens in the Cache will be charged. 24 yuan\/M token<\/p>\n<p><strong>Cache storage fee:<\/strong><\/p>\n<p>During the cache lifespan, the cache storage fee is charged per minute. 10 yuan\/M token\/minute<\/p>\n<p><strong>Cache call fee:<\/strong><\/p>\n<p>Charges for calling incremental tokens for Cache: charged according to the original price of the model<\/p>\n<p><strong>Cache call charges:<\/strong><\/p>\n<p>During the cache survival time, if the user requests a successfully created cache through the chat interface, and the chat message content successfully matches the surviving cache, the cache call fee will be charged according to the number of calls. 0.02 yuan\/time<\/p><\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Yesterday, Kimi open platform under the Dark Side of the Moon announced the start of public beta testing of Context Caching, a technology that, under the premise of unchanged API prices, can reduce the cost of using the flagship large model of up to 90% of long text for developers and significantly improve the response speed of the model. Context Caching is an efficient data management technology that allows the system to pre-store large amounts of data or information that may be frequently requested. Thus, when you request the same information again, the system can quickly provide it directly from the cache without having to recalculate or retrieve it from the original data source, thus saving time and resources. Context Caching is particularly well suited for frequently requested information.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1814,1168],"collection":[],"class_list":["post-14491","post","type-post","status-publish","format-standard","hentry","category-news","tag-kimi","tag-1168"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/14491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=14491"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/14491\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=14491"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=14491"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=14491"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=14491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}