"Language discrimination" in the AI syllable: Ask Claude, token consumes more than three times as much as English

"Language Discrimination" in the AI syllable: Ask Claude, token consumes more than three times more than English

On April 30th, yesterday, AI researcher Aran Komatsuzaki published a large-scale mainstream symmetry tooltokenthe results of a cross-examination show that Tokenizer has "language discrimination":

"Language Discrimination" in the AI syllable: Ask Claude, token consumes more than three times more than English

when using the same model, non-english users actually consume far more tokens than english users, amounting to a quiet “non-english tax”。

He translated the famous paper by Rich Sutton, The Bitter Lesson, into nine languages and fed tokenizers of six models, using a benchmark of 1 times the number of tokens on the OpenAI semiword tool in the original English language to measure the consumption of languages on different models。

The results show that the same content is being asked in ChineseClaude token consumes 1.71 times the baseline, while OpenAI only 1.15 times. The situation in Hindi is more pronounced in Claude, where token consumes 3.24 times more than the benchmark and the Arabic language is 2.86。

6 Among the models cross-referenced, Anthropic has the highest “non-English tax”, followed by Kimi; Gemini and Qwen have the lowest non-English tax. Komatsuzaki put it bluntly: "I honestly didn't think Claude would be this close, and the gap was so wide. I'm sure corporate clients are very concerned about these issues. I'm sorry

Komatsuzaki noted that the efficiency of the syllables depends on the proportion of languages in model training data: English data is large and English terminology is efficiently compressed; non-English data are fewer and can only be cut more。

For users, the increase in token consumption means that API call costs rise directly, that the waiting time before the model responds is longer and the context window will run out faster. He came to the conclusion that who's big and who's token is much more economical。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

"Language Discrimination" in the AI syllable: Ask Claude, token consumes more than three times more than English

DeepSeek internal speculation mode, new multimodular model or will be released

The country's first undergraduate "Commercial Artificial Intelligence" has been approved

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

DeepSeek internal speculation mode, new multimodular model or will be released

The country's first undergraduate "Commercial Artificial Intelligence" has been approved

Anthropic opens Artifacts AI capabilities to all Claude users

Users reported that Claude AI chatbot was becoming "lazy", but the official response was that no adjustments were made

Anthropic Launches Next-Generation Hybrid Reasoning Model Claude 3.7 Sonnet: The Company's "Smartest" AI Model

Musk: Don't abuse AI

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow