Today, July 7AliCloud announced that Tongyi is officiallyOpen SourcereticulationAgent WebSailorThe WebSailor, which has powerful reasoning and retrieval capabilities, topped the list of open-source web intelligences on BrowseComp, a collection of intelligence reviews, after its release.1AI notes that the WebSailor's build solution and some of its datasets are currently open-sourced on Github.

According to AliCloud, WebSailor Web Intelligence can be applied to retrieval tasks in complex scenarios. For fuzzy questions, it can quickly retrieve and reason and validate the fuzzy questions in different webpages, so as to ultimately come up with the retrieval answer through rigorous multi-step reasoning and cross-validation in the huge amount of information.
Meanwhile, for the training of this intelligent body, the Tongyi Labs team adopted a set of innovative post-training methods, which greatly improved the performance of the open source model on complex web reasoning tasks. On the BrowseComp, a collection of high-level intelligent body reviews, WebSailor outperformed models and intelligences such as DeepSeek R1 and Grok-3, and topped the list of top open source web intelligences. WebSailor outperformed models and intelligences such as DeepSeek R1 and Grok-3 on BrowseComp, a collection of high-level web intelligence reviews.
- Open source address:
https://github.com/Alibaba-NLP/WebAgent
In order to validate WebSailor's experimental effectiveness, Tongyi Labs conducts real-world testing on multiple benchmark review sets.
BrowseComp is Open AI's open-source browser retrieval performance evaluation set, which aims to evaluate the retrieval performance of large models and intelligences. In the few months since its release, the evaluation set has included 1,266 difficult questions, making it one of the most difficult evaluation sets to date, and no open-source system in the industry has achieved results close to those of closed-source models.
The results of the English and Chinese versions of the BrowseComp evaluation set show that WebSailor crosses the gap between open source and closed source systems, and WebSailor-32B and WebSailor-72B not only achieve a breakthrough in the open source model and agent camp, but also surpass closed source models such as DeepSeek R1 and Grok-3 (note that DeepSeek R1 should be an open source model, in official words). models (Note: Officially, DeepSeek R1 should be an open source model), second only to the closed source OpenAI DeepResearch.
Although WebSailor is trained only on difficult data, it also outperforms other methods on the dataset focusing on the common task SimpleQA, showing great compatibility and effectiveness, validating the generalization ability of the WebSailor method.
According to Aliyun, WebSailor provides a generalized workflow that can be borrowed for problems in other domains. It emphasizes the "difficult task synthesis + small-scale cold start + efficient RL optimization" combination punch strategy, which is highly universal. In the future, the open source community can refer to WebSailor's idea to tackle more similar "beyond human ability" tasks -- such as complex reasoning quizzes in open domains, academic knowledge discovery, and even cross-modal information integration.