July 22ByteDance The Seed team has announced its latest result: a generalizable, complex operational task that supports long sequences ofrobotOperation of the large model "Seed GR-3".

GR-3 is described as a large-scale vision-language-action (VLA) model. It shows good generalization ability to new objects, new environments, and new instructions containing abstract concepts. In addition, GR-3 supports efficient fine-tuning of small amounts of human trajectory data for fast and cost-effective adaptation to new tasks, and it also demonstrates robust and reliable performance in handling long-period and dexterous tasks, including those requiring two-handed operation and chassis movement.
The Seed team describes these capabilities as stemming from a variety of training methods: joint training with large-scale visual-linguistic data, efficient fine-tuning with user-authorized VR device-based human trajectory data, and effective imitation learning based on robot trajectory data.
It is worth mentioning that GR-3 performed better in the challenging flexible object manipulation test (specifically tested as a clothes hanging task), and was able to perform the operation of "threading a hanger into the clothes and then hanging them on a clothesline". In addition, GR-3 was able to generalize to clothing types not included in the training data.
Additionally, the Seed team has introduced ByteMini, a two-armed mobile robot that is said to combine dexterity and reliability with the integration of GR-3 to perform a wide range of complex tasks.