Friday, February 21, 2025

Microsoft's New AI Agent Can Control Software And Robots

Image Read more: Found here

Headlines:

Here are 10 current news headlines from around the world with a similar categorization: • Robotics and Artificial Intelligence * "Boston Dynamics' Atlas Robot Successfully Completes First Human-Human Handover" (The Verge)

* "New Robot Exoskeleton Helps Paralyzed Man Walk Again" (CNN) • Autonomous Vehicles * "Waymo Self-Driving Cars Pass Safety Test... Ready for Public Roads" (The New York Times)

* "Tesla's Full Self-Driving Beta Update Rolls Out to Eligible Owners" (Electrek) • Agricultural Robotics * "FarmBot Raises $2. 5 Million to Develop Autonomous Farming Systems" (TechCrunch)

* "Robotic Harvesters Start Picking Crops in Ukraine's Fields" (Reuters) • Healthcare Robotics * "Robot-Assisted Surgery Reduces Recovery Time for Patients" (Healthcare IT News)

* "Robotic Exoskeletons Help Patients Walk After Spinal Cord Injury" (Science Daily) • Industrial Automation * "ABB's YuMi Robot Wins Prestigious Industry 4. 0 Award" (ABB)

* "KUKA's LBR iiwa Robot Offers Increased Flexibility and Precision" (KUKA)

#news

On Wednesday, Microsoft Research introduced Magma , an integrated AI foundation model that combines visual and language processing to control software interfaces and robotic systems. If the results hold up outside of Microsoft's internal testing, it could mark a meaningful step forward for an all-purpose multimodal AI that can operate interactively in both real and digital spaces.

Microsoft claims that Magma is the first AI model that not only processes multimodal data (like text, images, and video) but can also natively act upon it—whether that's navigating a user interface or manipulating physical objects. The project is a collaboration between researchers at Microsoft, KAIST , the University of Maryland, the University of Wisconsin-Madison, and the University of Washington.

We've seen other large language model-based robotics projects like Google's PALM-E and RT-2 or Microsoft's ChatGPT for Robotics that utilize LLMs for an interface. However, unlike many prior multimodal AI systems that require separate models for perception and control, Magma integrates these abilities into a single foundation model.

Microsoft is positioning Magma as a step toward agentic AI, meaning a system that can autonomously craft plans and perform multi-step tasks on a human's behalf rather than just answering questions about what it sees.

"Given a described goal," Microsoft writes in its research paper, "Magma is able to formulate plans and execute actions to achieve it. By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal, spatial, and temporal intelligence to navigate complex tasks and settings."

Microsoft is not alone in its pursuit of agentic AI. OpenAI has been experimenting with AI agents through projects like Operator that can perform UI tasks in a web browser, and Google has explored multiple agentic projects with Gemini 2.0 .

Microsoft Magma researcher Jianwei Yang wrote in a Hacker News comment that the name "Magma" stands for "M(ultimodal) Ag(entic) M(odel) at Microsoft (Rese)A(rch)" after some people noted that "Magma" already belongs to an existing matrix algebra library , which could create some confusion in technical discussions.

No comments:

Post a Comment