{"id":52587,"date":"2026-04-30T11:40:37","date_gmt":"2026-04-30T03:40:37","guid":{"rendered":"https:\/\/www.1ai.net\/?p=52587"},"modified":"2026-04-30T11:40:37","modified_gmt":"2026-04-30T03:40:37","slug":"ai-%e5%88%86%e8%af%8d%e5%99%a8%e5%ad%98%e5%9c%a8%e3%80%8c%e8%af%ad%e8%a8%80%e6%ad%a7%e8%a7%86%e3%80%8d%ef%bc%9a%e7%94%a8%e5%8d%b0%e5%9c%b0%e8%af%ad%e9%97%ae-claude%ef%bc%8ctoken-%e6%b6%88%e8%80%97","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/52587.html","title":{"rendered":"\"Language Discrimination\" in the AI syllable: Ask Claude, token consumes more than three times more than English"},"content":{"rendered":"<p>On April 30th, yesterday, AI researcher Aran Komatsuzaki published a large-scale mainstream symmetry tool<a href=\"https:\/\/www.1ai.net\/en\/tag\/token\" title=\"[see articles with [token] labels]\" target=\"_blank\" >token<\/a>the results of a cross-examination show that Tokenizer has \"language discrimination\":<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-52588\" title=\"bdb9469j00teafie004gd000u0011wm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2026\/04\/bdb94669j00teafie004gd000u0011wm.jpg\" alt=\"bdb9469j00teafie004gd000u0011wm\" width=\"1080\" height=\"1364\" \/><\/p>\n<p>when using the same model, non-english users actually consume far more tokens than english users, amounting to a quiet \u201cnon-english tax\u201d\u3002<\/p>\n<p>He translated the famous paper by Rich Sutton, The Bitter Lesson, into nine languages and fed tokenizers of six models, using a benchmark of 1 times the number of tokens on the OpenAI semiword tool in the original English language to measure the consumption of languages on different models\u3002<\/p>\n<p>The results show that the same content is being asked in Chinese<a href=\"https:\/\/www.1ai.net\/en\/tag\/claude\" title=\"[View articles tagged with [Claude]]\" target=\"_blank\" >Claude<\/a> token consumes 1.71 times the baseline, while OpenAI only 1.15 times. The situation in Hindi is more pronounced in Claude, where token consumes 3.24 times more than the benchmark and the Arabic language is 2.86\u3002<\/p>\n<p>6 Among the models cross-referenced, Anthropic has the highest \u201cnon-English tax\u201d, followed by Kimi; Gemini and Qwen have the lowest non-English tax. Komatsuzaki put it bluntly: \"I honestly didn't think Claude would be this close, and the gap was so wide. I'm sure corporate clients are very concerned about these issues. I'm sorry<\/p>\n<p>Komatsuzaki noted that the efficiency of the syllables depends on the proportion of languages in model training data: English data is large and English terminology is efficiently compressed; non-English data are fewer and can only be cut more\u3002<\/p>\n<p>For users, the increase in token consumption means that API call costs rise directly, that the waiting time before the model responds is longer and the context window will run out faster. He came to the conclusion that who's big and who's token is much more economical\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>On April 30th, yesterday, AI researcher Aran Komatsuzaki published the results of a cross-evaluation of the main mainstream paraphrasing tool (tokenizer), which reveals \"linguistic discrimination\" in Tokenizer: When using the same model, non-English speakers actually consume far more tokens than English users, amounting to a quiet \u201cnon-English tax\u201d. He translated the famous paper from Rich Sutton, The Bitter Lesson, into nine languages and fed tokenizers of six models, using a 1-fold reference number of tokens in the OpenAI semiword tool in the original English language to measure each language<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1565,1270],"collection":[],"class_list":["post-52587","post","type-post","status-publish","format-standard","hentry","category-news","tag-claude","tag-token"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/52587","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=52587"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/52587\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=52587"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=52587"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=52587"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=52587"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}