← Back to Insights

Insight

Foundation Models for Matter

Ariel Agor

Moravec's Paradox stated that high-level reasoning is easy for computers, but low-level sensorimotor skills are hard. It was easier to build a chess champion than a robot that could fold laundry. A computer could beat grandmasters while struggling to navigate a cluttered room. For decades, this paradox held true and seemed fundamental.

It just broke. The "Foundation Models for Robotics" released this month have done for physical motion what GPT-3 did for text. Robotics is having its transformer moment, and everything changes.

The Moravec Barrier

Hans Moravec observed in the 1980s that the skills we consider "intelligent"—chess, mathematics, logical reasoning—are actually easy for machines, while the skills we consider automatic—walking, grasping, recognizing faces—are astonishingly hard. A four-year-old can catch a ball; building a robot to do the same took decades of research and still produced clumsy results.

The explanation lies in evolution. Abstract reasoning is a recent addition to our cognitive toolkit—maybe a hundred thousand years old. We're not naturally good at it; we had to invent formal systems and spend years in school learning them. Sensorimotor skills, by contrast, have been refined over billions of years of evolution. The neural circuits for movement and perception are ancient, complex, and deeply optimized. They only look easy because evolution did the hard work for us.

For AI, the situation was reversed. Computers could execute formal rules perfectly from the start—that's what they were designed to do. But they had no evolutionary heritage of embodied experience to draw on. Every insight about physics, every intuition about object behavior, had to be programmed explicitly. The robotics code for "pick up a cup" ran to thousands of lines of exception-handling for edge cases that a human handles unconsciously.

Learning Physics from Video

The breakthrough came from an unexpected direction: video. Just as language models learned to predict text by ingesting billions of words, robotic foundation models learned to predict motion by ingesting billions of hours of video. They watched humans and animals and machines move through the world, and from this ocean of examples, they extracted the implicit physics.

These models understand gravity—not as an equation, but as an intuition about how unsupported objects behave. They understand friction—sensing that a wet floor requires different movement than a dry one. They understand object permanence—knowing that a ball rolling behind a couch still exists and will emerge on the other side. This is the "physics of common sense" that AI researchers struggled for decades to encode symbolically.

The key insight was that physical understanding doesn't need to be programmed; it can be learned. Show a model enough examples of doors opening, and it learns how doors work—not just one specific door, but the concept of doors in general. It can then generalize to doors it's never seen, in configurations it's never encountered.

The Software Invasion

We are seeing the software invade the hardware. A general-purpose robot doesn't need to be programmed for each specific task; it just needs to have "seen" similar tasks in its training data. It generalizes. It improvises. It adapts to situations that no programmer anticipated.

This is a profound shift in the economics of robotics. Under the old paradigm, each task required custom engineering—a manufacturing robot for welding, a different robot for painting, another for assembly. Each application bore the full cost of development. The only economically viable robots were those performing high-volume, precisely defined tasks.

Under the new paradigm, the foundational intelligence is amortized across all applications. Once a robot understands how to manipulate objects in general, teaching it to perform a specific task is incremental—a matter of fine-tuning, not reinvention. The cost curve inverts: instead of expensive bespoke systems, we get cheap adaptable ones.

Digital to Physical

This is the moment the digital spills over into the physical. The intelligence we've been accumulating in server farms—the vast computational knowledge distilled from the internet—is leaking out into bodies. We're not just processing information anymore; we're moving atoms.

Warehouse bots, kitchen assistants, construction drones—they are no longer following scripts written by human programmers. They are improvising jazz in the physical world, adapting in real-time to the unexpected realities of messy physical environments. The gap between simulation and reality—the "sim-to-real" problem that bedeviled robotics—is narrowing because the models have learned from enough real-world video to bridge it.

The implications cascade across industries. Logistics, already transformed by automation, will accelerate further as robots handle an ever-wider range of picking and packing tasks. Manufacturing becomes more flexible as the same robots can be retasked for different products. Agriculture can be automated at a finer grain, with machines that tend individual plants rather than treating fields as uniform. Eldercare—the looming crisis of aging populations—might be addressed by robotic assistants that can cook, clean, and help with daily tasks.

Programmable Matter

At the deepest level, matter is simply slow information. Atoms are just patterns; materials are just configurations; objects are just frozen computation. As we inject fast intelligence into slow matter, the physical world becomes programmable in a new sense.

The "smart home" is no longer about voice-controlled lights or automated thermostats. It's about a house that cleans, repairs, and organizes itself. A kitchen where robots prepare meals. A workshop where machines handle fabrication. An environment that actively maintains itself rather than passively waiting for human intervention.

We are at the beginning of this transition, not the end. Current robotic systems are still clumsy compared to biological organisms. They fail at tasks that seem trivial—handling soft objects, working in unstructured environments, recovering gracefully from errors. But the trajectory is clear. The same exponential improvement we saw in language models is beginning in robotic foundation models. In five years, the robots of today will seem as primitive as the chatbots of 2020.

Moravec's Paradox was never a law of nature—just a description of where AI development happened to be. Now it's becoming a historical curiosity, a reminder of how quickly the impossible can become inevitable.