Alibaba open source autonomous search AI intelligence body WebAgent

May 30th.AlibabaYesterday on GithubOpen Sourcehas introduced its innovative Autonomous Search AI Agent - theWebAgentIt has end-to-end autonomous information retrieval and multi-step reasoning capabilities, and is able to proactively perceive, make decisions and act like a human in a networked environment.

Alibaba open source autonomous search AI intelligence body WebAgent

For example, when a user wants to know the latest research results in a specific field, WebAgent can actively search multiple academic databases, filter out the most relevant literature, and conduct in-depth analysis and summarization according to the user's needs.

According to the introduction, WebAgent not only recognizes key information in the literature, but also integrates ideas from different literature through multi-step reasoning, ultimately providing users with a comprehensive and accurate research report.

Alibaba WebAgent is divided into WebDancer and WebWalker, the former is an end-to-endAgentThe latter is a "LLM Benchmarking in Web Traversal", a training framework designed to enhance the multi-step information search capabilities of web-based AI intelligences.

Performance on Web Agents:

WebDancer's framework consists of 4 blocks, from data construction to training and optimization, to gradually create an intelligent body that can autonomously complete complex information retrieval tasks.

Browse data construction is the starting point of the whole framework. In the real world, high-quality training data is key for intelligences to be able to learn and generalize effectively.WebDancer addresses the limitations of traditional datasets with two innovative data synthesis methods.

To ensure that the generated trajectories are both efficient and coherent, WebDancer employs both short and long reasoning approaches. Short reasoning uses a large model to directly generate concise reasoning paths, while long reasoning builds complex reasoning processes step-by-step through a reasoning model.

After data preparation, WebDancer enters the supervised fine-tuning (SFT) phase. The goal of this phase is to initialize the training of the intelligences with high-quality trajectory data so that they can be adapted to the format and environmental requirements of the information retrieval task.

During the SFT process, WebDancer labels the thinking, acting, and observing elements of the trajectory separately and calculates a loss function to optimize the parameters of the model. To improve the robustness of the model, WebDancer excludes the influence of external feedback when calculating the loss, ensuring that the model can focus on the autonomous decision-making process. This phase of training provides a strong initial capability for the intelligences to better adapt to complex task environments in subsequent reinforcement learning phases.

The Reinforcement Learning (RL) phase is a key aspect of the WebDancer framework. In this phase, intelligences learn how to make optimal decisions in complex tasks by interacting with the environment.WebDancer employs the DAPO algorithm, a reinforcement learning algorithm specifically designed for intelligence training.

The DAPO algorithm effectively utilizes underutilized QA pairs through dynamic sampling mechanism to improve data efficiency and robustness of the strategy. In the RL process, the intelligent body gradually optimizes its decision-making strategy through multiple attempts and feedbacks, and finally realizes efficient multi-step reasoning and information retrieval capability.

1AI Attach the official WebAgent address:

  • Github:https://github.com/Alibaba-NLP/WebAgent

  • WebDancer Thesis:https://arxiv.org/ pdf/2505.22648

  • WebWalker Thesis:https://arxiv.org/ pdf/2501.07572

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

China Academy of Information and Communications Technology, Vivo, Honor, OPPO, Xiaomi, Huawei joint initiative: building a terminal intelligence body ecology

2025-5-29 16:58:10

Information

National New Industrialization Operating System "Hongdao" Released: Used for Embodied Intelligent Robots, Supporting GPU/NPU Architectures of Huawei, Longxin, NVIDIA, Intel, etc.

2025-5-30 10:31:53

Search