Ali Tongyi Thousand Questions released a blog post launching the Qwen2.5-Turbo open source AI model. The model has been optimized and polished for several months, expanding the context length from 128,000 to 1 million tokens, which can accommodate a large amount of text content. It achieves an accuracy of 100% in 1M-token Passkey retrieval task, with a RULER long text evaluation score exceeding GPT-4 and GLM4-9B-1M. By integrating the sparse attention mechanism, the team has achieved a significant reduction in the processing time of 1 million tokens, a 4.3 times speedup, and a processing cost of $0.3 per million tokens that is more competitive in terms of economics. However, the team also realizes that the model's performance may be unstable in real scenarios with long sequence tasks, and the inference cost needs to be further optimized, and promises to continue to optimize the relevant aspects and explore more powerful long context models.
Official website: https://qwen2.org/qwen2-5-turbo/
DEMO address: https://huggingface.co/spaces/Qwen/Qwen2.5-Turbo-1M-Demo
