Big Model Evaluation System: Sinan OpenCompass 2.0 Released

OpenCompass2.0, an open source evaluation system for large-scale models, is officially released, aiming to provide one-stop evaluation services for large language models, multimodal models, etc. OpenCompass2.0 comprehensively quantifies the model's performance in the five dimensions of knowledge, language, comprehension, reasoning, and examination, and objectively and neutrally provides technical support for the technological innovation of large-scale models. OpenCompass2.0 also announced the 2023 large-scale model public evaluation list. The evaluation results show that GPT-4 Turbo has the best performance in all evaluations, followed by GLM-4, Alibaba Qwen-Max, and Baidu Wenxin Yiyin 4.0; the overall capability of the large language model still has more room for improvement, and the ability related to complex reasoning is still a shortcoming; in the Chinese language scenario, the domestic model has a greater advantage, and the Chinese closed-source large model is close to the level of GPT-4 Turbo, and the open-source model is close to the level of GPT-4 Turbo, and the open-source model is close to the level of GPT-4 Turbo. Turbo's level, open-source models are progressing rapidly, reaching a higher performance level with a smaller volume, showing a larger development potential.

Official website:
https://opencompass.org.cn/
CompassHub Community Address:
https://hub.opencompass.org.cn/home

Search