Video-XL-2, an open source lightweight ultra-long video comprehension model from Wisdom Source Research Institute, can efficiently process up to 10,000 frames of video input on a single card; the model consists of three parts, namely, a visual coder, a dynamic token synthesis module, and a large language model, and adopts a four-phase incremental training method, as well as introduces a segmented pre-loading strategy and a double-granularity KV decoding mechanism; Video-XL-2 surpasses all lightweight open-source models on the mainstream evaluation benchmark, and takes only 12 seconds to encode 2048 frames of video. Video-XL-2 outperforms all lightweight open source models in mainstream evaluation benchmarks, and it takes only 12 seconds to encode 2048 frames of video, which can be applied to movie and TV content analysis, abnormal behavior monitoring and other scenarios.
❯
Search
Scan to open current page
Top
Checking in, please wait
Click for today's check-in bonus!
You have earned {{mission.data.mission.credit}} points today!
My Coupons
-
¥CouponsLimitation of useExpired and UnavailableLimitation of use
before
Limitation of usePermanently validCoupon ID:×Available for the following products: Available for the following products categories: Unrestricted use:Available for all products and product types
No coupons available!
Unverify
Daily tasks completed:
