Google is introducing a new advancement in robotics called Robotics Transformer 2 (RT-2), which is a vision-language-action (VLA) model. RT-2 is trained on text and images from the web, allowing it to directly output robotic actions. This development brings robots closer to being helpful and adaptable in various environments. Previous approaches to robot learning required training on billions of data points, making it impractical. However, RT-2 can transfer concepts from its language and vision training data to perform robot actions, even for tasks it hasn’t been explicitly trained for. In testing, RT-2 demonstrated improved performance on novel scenarios compared to previous models. This advancement in AI promises a brighter future for general-purpose robots that can adapt to new situations.

source update: What is RT-2? Google DeepMind’s vision-language-action model for robotics


There are no comments yet.

Leave a comment