Welcome back to The Hidden Layer, my new twice-weekly private email on the business of A.I. I’m
Ian Krietzberg.
Thanks for all the great feedback on our inaugural issue, covering the lobbying battle surrounding the moratorium on A.I. regulation that was stripped out of Trump’s Big Beautiful Bill. (If you missed that dispatch, you can catch up here.) Please keep it coming! If you’ve got questions, suggestions, tips—or, really
anything you want to get off your chest—just reply to this email. You can also message me on Signal at 732-804-1223. (And if you’re not yet subscribed to Puck, click here to remedy that.)
🎧 Come again, Grok?: I’m also on today’s episode of The Powers That Be, Puck’s flagship podcast, discussing the firestorm surrounding Grok, Elon’s rogue A.I. chatbot. Listen
here or here. (More on the Grok situation below…)
In today’s issue, my candid conversation with Anastasis Germanidis, the C.T.O. and co-founder of Runway, one of the major A.I. image- and
video-generation companies, about how his company is trying to solve the problem of hallucinations with “world simulators.”
Also mentioned in this issue: Elon Musk, Grok 4, Russell Schwartz, Microsoft, BioEmu, AlphaFold, Perplexity, OpenAI, Frank Noé, Gary Marcus, Alexandra Ebert, and many more…
|
Four Things You
Should Know
|
-
Here comes Grok 4…: On Wednesday night, while the xAI team was still cleaning up the damage from an update that caused its Grok chatbot to respond to prompts with antisemitic and sexually violent comments, Elon Musk unveiled the latest version of what he dubbed “the world’s most powerful A.I. assistant.” On the surface, Grok 4 slightly edges out its competitors—o3, Gemini 2.5 Pro, Claude 4 Opus, etcetera—to
lead the Artificial Analysis Intelligence Index, a benchmark that aims to grade chatbot capability. (Capitalizing on the news, Musk also announced a new $300 monthly A.I. subscription, SuperGrok Heavy.)
Of course, the trouble here is that the model training data remains unknown; without assessing the underlying data, it’s hard to determine whether the model actually has
the underlying capability suggested by the Intelligence Index, or whether that model’s training data simply encompasses the questions on the benchmark. It’s the difference between genuine understanding, which is strong and flexible, and the illusion of understanding, which is brittle. In other words, many researchers are skeptical that these benchmarks actually measure what they claim.
What is clear, however, is that benchmark performance does not necessarily relate to
real-world efficacy, usability, or potential for harm. For example, the team made no mention of the electricity and carbon cost associated with training or running the model, and didn’t detail any innovations that might curtail hallucinations. Of course, none of that prevented Musk from making grandiose promises about Grok 4: “I would expect Grok to discover new technologies that are actually useful no later than next year, and maybe end of this year,” he said. “It might discover new physics
next year… Let that sink in.”
I don’t know what “new physics” means. But if I was a betting man, my money would not be on Elon’s new chatbot to discover it. - The V.C. wave is still rising: According to Crunchbase data, A.I. companies around the world raked in more than $40
billion in venture capital funding in the second quarter of 2025, nearly half of all global venture funding tracked by the service. It’s yet another massive quarter for the sector: In Q1, A.I. companies brought in $60 billion in funding, and in the last quarter of 2024, the sector brought in $44 billion. We’re several years into this race, and based on these results, there’s no reason to believe the V.C. spigot will be turned off anytime soon.
- The
browser wars: Ever since the major A.I. labs hooked their chatbots up to the internet, it’s been clear that they were attempting to chip away at Google’s virtual monopoly on search. On Wednesday, Perplexity launched Comet, a web browser that connects users to its A.I. search engine. (It’s currently available only to users who pay Perplexity $200 per month for its services.) Just a few hours later, Reuters
reported that OpenAI is planning to release its own A.I.-fueled web browser within the next few weeks. Both companies are interested in bypassing the traditional search engines to get direct access to users and their data. Google still
controls nearly 90 percent of the global search market, and its share of the browser market remains at a healthy 70 percent. Breaking that vise grip is a worthy goal, but it won’t be easy.
- Below the
AlphaFold: Last year, Google DeepMind researchers won a Nobel Prize in chemistry for their role in designing AlphaFold, a deep learning model that can predict how proteins fold with meaningful reliability. The breakthrough enabled researchers to harness vast amounts of biological data that could lead to enhanced drug developments. But as Russell Schwartz, the head of Carnegie Mellon’s computational biology department, told me, proteins don’t fold just once, they
shift “through an ensemble of shapes.”
Now, a new deep learning diffusion model called BioEmu aims to build on AlphaFold’s breakthrough. BioEmu, which was developed by Microsoft’s A.I. for Science division and announced earlier today in a research paper published in Science, can predict the different shapes a protein might take as it shifts through that ensemble.
BioEmu was trained on a series of datasets that combines a version of AlphaFold’s protein database with a massive quantity of molecular dynamic simulation data, which Microsoft ran concurrently on thousands of G.P.U.s for more than a year.
According to Microsoft research partner Frank Noé, the hope is to further enhance our drug discovery capabilities, although he added that the scientific goal is just to better understand molecular biology. “If you
understand how the machine works, you can start thinking about how to fix it if something’s broken,” he told me, adding that, while this is a milestone, it’s just one step down the long, winding road to better understanding biology.
“I don’t think this is the final solution to the problem,” said Schwartz, who was not involved in the research. “I don’t think it’s going to be game-changing to the same degree AlphaFold was, but it’s an important step toward getting more to what a protein is
really like in its functional form.”
|
“I’m not afraid of this ‘singularity’ moment, where something is developed that we just can’t rein in. It
will be similar to the evolution of car safety, where our first car safety measure was a person walking in front of a car with a red flag informing pedestrians, Hey, please don't get run over by this new innovation, and over the course of time, because we had cars on the streets, we saw airbags would be quite helpful, or seat belts and stuff like that.” —Alexandra Ebert, chief A.I. and “data democratization officer” at Mostly AI, pontificating in an interview with me
last week about whether society should throw more guardrails around the industry.
|
|
|
A candid conversation with Anastasis Germanidis, the C.T.O. and co-founder of Runway, the
major A.I. image- and video-generation company, about the trouble with building “world models,” the next steps in advancing the technology, the highly contentious issue of automated media, and much more.
|
|
|
A classic concept in cognitive science is what researchers call
mental models—or how our internal representation of the external world informs how we navigate reality. An illustrative example: Michael Jordan was able to close his eyes and still make a free throw because his “mental model” included the exact size,
height, distance, etcetera of the basket, and that doesn’t go away just because you close your eyes. A neural network–based system, lacking a reliable model of the physical world, would likely not be able to make that free throw. This shortcoming has increasingly become a point of focus for A.I. companies, who are attempting to build their own “world models” (another term whose definition changes depending on whom you talk to) in order to provide machines with a stable, physically accurate
internal representation of certain parts of the external world. For the self-driving car company Waymo, that could mean a world model designed to simulate traffic patterns. For chess-playing machines, it could be what author and cognitive scientist Gary Marcus recently
referred to as “board” models, which offer rule-based, constantly updating representations of a given game board.
So far, there’s no evidence that large language models possess world models, even though some researchers and engineers believe they might naturally emerge over time. And it is this absence of grounded, rules-based modeling, Marcus recently argued, that explains why L.L.M.s often “hallucinate” in strange and unexpected ways: “What L.L.M.s do is to extract correlations between bits of language (and in some cases images) but they do this without the laborious and difficult work (once known as
knowledge engineering) of creating explicit models of who did what to whom when and so forth,” he wrote.
It’s a problem that Runway, the pioneering A.I. image- and video-generation company, is hoping to solve. The company, founded in 2018, has raised $536 million from investors such as Nvidia and Softbank, according to Crunchbase data, and has secured a partnership
with Lionsgate Studios. (Runway’s models were notably leveraged by visual effects artist Evan Halleck when he was working on Everything Everywhere All at Once in 2022.) But taking the software to the next level requires eliminating those often-viral hallucinations that
might show, for example, seven fingers on a man’s hand, or a bunch of dogs merging out of one dog, or a cat sprouting random additional limbs. These generations are the result of massive statistical analyses—not an actual understanding of physics, or human hands, or cats.
To
learn how Runway is tackling this problem, I caught up with Anastasis Germanidis, the company’s C.T.O. and co-founder. This conversation has been lightly edited and condensed for clarity.
|
Ian Krietzberg: Runway is an image- and video-generation company. How do
you go from that to building world models?
Anastasis Germanidis: We did a lot of the foundational research in image and video generation, and what we saw as we were building better and better video generation models is that, in order to solve video generation—to generate videos that are physically realistic, high quality, and high fidelity—those models essentially need to learn how to simulate the world. They need to understand physics and how humans interact in space, and understand different aspects about the world and the dynamics of the world. This created our initial interest in building world simulators, as we call them.
Can you say a little more about the application of “world simulators”?
There are, I would say, two broad areas of world simulators that are interesting from a use-case perspective. One involves creating new forms of
entertainment—and this is something we’re increasingly going to be doing on top of those models. Imagine gaming experiences where a game is generated as you’re playing it. We imagine most video generation will become real-time and personalized, and it will be generated on the fly as you’re experiencing it.
The other aspect where world simulation is very relevant, I think, is that simulation will essentially power most model training in the coming years. Because if you want to build models
and systems that can operate in the real world—like robots that can help you with household tasks—it’s much more effective to train them in simulated environments versus deploy them in the world. And that same [thing] applies in robotics and in building self-driving cars.
Models today are limited in their capacity to consistently and reliably produce physical representations of the real world. Are you exploring other architectures? Do you think the neural networks of today will
become more capable?
There are always algorithmic advancements being made, and we’re working on new types of architectures. So traditionally, on image and video generation, diffusion models were the architecture of choice, while in language models, it’s autoregressive models. What we’re increasingly seeing is a convergence between the two—the best of both of those worlds—and we’re working on some new architectures along that direction.
But
I would say, more than the architectures themselves, the way you’re training those models will be the thing that changes the most. So effectively, training models to perform tasks in simulated environments over long time horizons will become the way to further improve performance, more than any specific algorithmic improvements.
You’ve said that one of the biggest bottlenecks involves ensuring that these model simulations are physically realistic. How do you overcome that
bottleneck?
You overcome the bottleneck by essentially scaling those simulators with enough data and compute: finding a sufficient amount of data that captures the environment that you want to simulate, and then increasing the model and compute scale.
|
You mentioned self-driving cars as an example of how world models can be applied. But the problem
with self-driving cars is that it’s a physical impossibility to capture sufficient data for all possible edge cases. And that gets us to the root question around genuine machine understanding and generalizability: We’re trying to brute-force our way to simulating generalizability, but it’s not really generalizability, because it’s built on such a massive scale rather than on systems that can reliably generalize based on a
small dataset. Are you at all exploring methods of world modeling that don’t rely on massive scale?
Are you familiar with “The Bitter Lesson”? The thing that has repeatedly worked in this field has been finding the methods that can leverage more compute and more resources, and
increasing the scale that you apply them at. There are going to be improvements in data efficiency—learning from less data—but I don’t expect that to be the biggest driver of progress. We still haven’t trained on a sufficient amount of observations from the world. I think we’re still very early in leveraging all the data and all the observations that are available. So I don’t think we’re anywhere close to actually saturating that available data.
What does this all mean for
Runway as a business? Are you eyeing a push beyond simple image and video generation?
Yeah, I think the potential of those models goes beyond content creation. I would say gaming is the most proximate area that we’re exploring. We have a new platform that we call Game World, where essentially anyone can generate a new world from a premise, and by kind of describing the game mechanics, they can then create a game they can share with others. But beyond gaming, there are applications that we’re exploring in robotics and in self-driving and a few other areas. We do think those models are very general, and so we’re going to be announcing some different collaborations and partnerships on many different fronts.
That opens up the
highly contentious issue of automated media, which has already been resulting in job loss within these industries. Do you see this as something that will be associated with a reduction in the necessary workforce?
I think that there’s definitely going to be a change in the roles and the kind of jobs that someone working in gaming, for example, would perform. Traditionally, with new technologies, they end up expanding what jobs are possible. We saw that with different generations
of creative software as well—there was the narrative that Photoshop would destroy all visual art or visual design. I think the same will happen with gaming.
There’s still going to be a big importance on taste and vision and coming up with the best ideas and designs for game experiences—but the creation of those games will not look the same as it looks today. You might not need to create every single asset and every single object in the game world. It’s more about designing the overall
rules of the system. Maybe the absolute number of games will increase, and the kinds of things that you can do as a game designer will change, but I don’t necessarily see it as a reduction in the amount of jobs that are available in this field. It’s more going to be a change.
Artificial general intelligence is practically meaningless as a term, but is Runway interested in becoming an A.G.I. lab?
To me, simulation goes beyond A.G.I., and I also think the goal of
getting parity on tasks with humans seems like it’s not the right goal to strive for. I think we should use A.I. techniques to go far beyond and augment human ability. So I wouldn’t call ourselves an A.G.I. lab., and I think even A.G.I. labs will move beyond this goal, because really what you want is to expand what humans can do to accelerate scientific progress, to solve all those different challenges. And that doesn’t necessarily mean that you need to get as good as humans in those challenges.
You need to build systems that go beyond what humans can do.
|
Hope you enjoy this rainy weekend. I’ll see you next week.
Ian
|
|
|
Join Puck’s chief political columnist, John Heilemann, as he roams the corridors of power and influence in America on this
twice-weekly interview show, taking you beyond the headlines with the people who shape our culture: icons and up-and-comers, incumbents and insurgents, moguls and machers in the overlapping worlds of politics, entertainment, tech, business, sports, media, and beyond. The conversations are rich and revealing, unrehearsed and unexpected… and reliably impolitic. A Puck-Audacy joint, new episodes drop every Wednesday and Friday.
|
|
|
Unique and privileged insight into the private conversations taking place inside boardrooms and corner offices up and down Wall
Street, relayed by best-selling author, journalist, and former M&A senior banker William D. Cohan.
|
|
|
Need help? Review our FAQ page or contact us for assistance. For brand partnerships, email ads@puck.news. You received this email because you signed up to receive emails from Puck, or as part of your Puck account associated with {{customer.email}}. To stop receiving this newsletter and/or manage all your email preferences, click here.
|
Puck is published by Heat Media LLC. 107 Greenwich St, New York, NY 10006
|
|
|
|