Recently, computer scientists have been working on teaching machines to do a wider range of tasks around the house. In a new paper spearheaded by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the University of Toronto, researchers demonstrate R20;VirtualHome,” a system that can simulate detailed household tasks and then have artificial R20;agents” execute them, opening up the possibility of one day teaching robots to do such tasks.
The team trained the system using nearly 3,000 programs of various activities, which are further broken down into subtasks for the computer to understand. A simple task like “making coffee,” for example, would also include the step “grabbing a cup.” The researchers demonstrated VirtualHome in a 3-D world inspired by the Sims video game.
The team’s AI agent can execute 1,000 of these interactions in the Sims-style world, with eight different scenes including a living room, kitchen, dining room, bedroom, and home office.
“Describing actions as computer programs has the advantage of providing clear and unambiguous descriptions of all the steps needed to complete a task,” says PhD student Xavier Puig, who was lead author on the paper. “These programs can instruct a robot or a virtual character, and can also be used as a representation for complex tasks with simpler actions.”
The project was co-developed by CSAIL and the University of Toronto alongside researchers from McGill University and the University of Ljubljana. It will be presented at the Computer Vision and Pattern Recognition (CVPR) conference, which takes place this month in Salt Lake City.
How it works
Unlike humans, robots need more explicit instructions to complete easy tasks — they can’t just infer and reason with ease.
For example, one might tell a human to “switch on the TV and watch it from the sofa.” Here, actions like “grab the remote control” and “sit/lie on sofa” have been omitted, since they’re part of the commonsense knowledge that humans have.
To better demonstrate these kinds of tasks to robots, the descriptions for actions needed to be much more detailed. To do so, the team first collected verbal descriptions of household activities, and then translated them into simple code. A program like this might include steps like: walk to the television, switch on the television, walk to the sofa, sit on the sofa, and watch television.
Once the programs were created, the team fed them to the VirtualHome 3-D simulator to be turned into videos. Then, a virtual agent would execute the tasks defined by the programs, whether it was watching television, placing a pot on the stove, or turning a toaster on and off.
The end result is not just a system for training robots to do chores, but also a large database of household tasks described using natural language. Companies like Amazon that are working to develop Alexa-like robotic systems at home could eventually use data like this to train their models to do more complex tasks.
The team’s model successfully demonstrated that, their agents could learn to reconstruct a program, and therefore perform a task, given either a description: “pour milk into glass,” or a video demonstration of the activity.
“This line of work could facilitate true robotic personal assistants in the future,” says Qiao Wang, a research assistant in arts, media, and engineering at Arizona State University. “Instead of each task programmed by the manufacturer, the robot can learn tasks just by listening to or watching the specific person it accompanies. This allows the robot to do tasks in a personalized way, or even some day invoke an emotional connection as a result of this personalized learning process.”
In the future, the team hopes to train the robots using actual videos instead of Sims-style simulation videos, which would enable a robot to learn simply by watching a YouTube video. The team is also working on implementing a reward-learning system in which the agent gets positive feedback when it does tasks correctly.
“You can imagine a setting where robots are assisting with chores at home and can eventually anticipate personalized wants and needs, or impending action,” says Puig. “This could be especially helpful as an assistive technology for the elderly, or those who may have limited mobility.”