When we point a camera at the world, we capture photons—light bouncing off objects, through a lens, onto a sensor. The resulting image is a record, a trace of what was actually there. Video is many such traces in sequence, a temporal cross-section of physical reality.
When Sora generates a video, something fundamentally different happens. It doesn't capture; it simulates. It hallucinates reality into existence, pixel by pixel, frame by frame. This is not "video generation" in the sense of editing or manipulating footage. It is "world simulation"—the construction of coherent physical realities that never existed.
Learning Physics
The remarkable thing about Sora isn't the visual fidelity—impressive as that is—but the implicit physics. It understands how light reflects off a puddle, creating rippling highlights. It knows how a dog's legs coordinate in a run, how dust floats in a beam of sunlight, how fabric drapes and folds. These aren't programmed rules; they're learned intuitions, absorbed from watching millions of hours of real video.
This represents a form of understanding that we didn't expect language models to develop. When GPT learned language, it learned to predict text. When video models learn, they learn to predict frames—and in doing so, they learn the physics that makes frames coherent. They become, in some sense, physicists.
The camera enforces physical consistency automatically—reality is self-consistent, and recording it captures that consistency. Generative models must enforce consistency through learned constraints. They must internalize the rules that reality follows. The fact that they can do this—that visual consistency can emerge from prediction on video data—is a deep insight about what learning can achieve.
The Captured and the Calculated
This changes the nature of media. For 150 years, since the invention of photography, media has been synonymous with capture. A photograph was evidence—proof that light hit a sensor in a particular pattern. A video was testimony—a sequential record of what happened. "Pics or it didn't happen" assumed that pictures were reliable traces of reality.
That assumption is dissolving. When videos can be generated from scratch—not edited from existing footage, but dreamed into existence—the evidentiary value of media collapses. A video of an event proves nothing; it could be simulated. The line between the recorded and the calculated blurs to invisibility.
This creates obvious problems: deepfakes, misinformation, the erosion of shared truth. But it also creates possibilities. Media is no longer constrained by what happened or what was filmed. Any scene that can be described can be generated. Any story that can be imagined can be visualized. The creative constraint moves from "can we shoot this?" to "can we describe it?"
World Engines
We are building engines that can dream consistent realities. Not just images, but worlds—environments that persist, that follow rules, that can be explored from multiple angles. The current systems generate linear video: a sequence of frames depicting a scene. The next step is interactive video: scenes that respond to input, worlds that can be navigated.
This is the Holodeck being built pixel by pixel. The simulation becomes a space rather than a recording. You don't watch it; you enter it. You can walk around, interact with objects, change the story. The distinction between video game and movie dissolves; both become instances of simulated reality.
Simulated Futures
In the future, a movie won't be a static file you watch. It will be a dynamic simulation you enter. You'll experience the story from inside, as a participant or observer according to your preference. The narrative might branch based on your choices. The details might vary based on your interests.
This changes what storytelling means. The creator no longer crafts a fixed sequence of images; they craft a world with rules and a story that can play out in multiple ways. The viewer becomes a participant. The boundary between author and audience blurs.
We're only at the beginning of this transition. Current systems have limitations—temporal inconsistencies, physical glitches, limited duration. But the trajectory is clear. The world simulation engines are improving rapidly. In a decade, synthetic video may be indistinguishable from—or more compelling than—recorded reality. In two decades, the Holodeck may be on your desktop.
The Technium is building reality itself as a medium. What we record, we can now create. What we observe, we can now simulate. The nature of media is shifting, and with it, our relationship to truth, story, and experience.