Welcome back to The Hidden Layer, my new twice-weekly private email on the most pressing
developments in the A.I. industry. I’m Ian Krietzberg.
Thanks for all the great feedback on last Thursday’s issue, where I took a close look at a new study on the efficacy (or lack thereof) of A.I. coding tools. If you’ve got any thoughts, questions, or bones to pick, just reply to this email. You can also message me on Signal at
732-804-1223. And if you’re not yet subscribed to Puck,
click here to change that.
🎧 On the pods: I joined Julia Ioffe on Puck’s flagship podcast, The Powers That Be, to chat about Anthropic’s class certification and Trump’s pending A.I. action plan. Give it a listen
here!
In today’s issue, a deep dive into Uber’s plans to bolster its ride-hailing fleets with self-driving cars. Plus, notes on Anthropic’s major legal battle, the problems with Sam Altman’s new “ChatGPT agent,” the latest chapter in the A.I. math wars, and more.
Mentioned in this issue: OpenAI, Sam Altman,
Google DeepMind, Anthropic, William Alsup, Uber, Lucid, Nuro, Missy Cummings, and many more…
Let’s get into it…
|
Three Things You
Should Know
|
- Sam
Altman doesn’t want you to use OpenAI’s new toy: Last week, OpenAI launched a new product called “ChatGPT agent,” ostensibly a sort of white-glove assistant that can help you prepare for everything from international travel to a close friend’s wedding. As usual, we don’t know anything about the size of the A.I. model, the data it was trained on, its associated carbon emissions, etcetera. But, notably, OpenAI C.E.O. Sam Altman was particularly—and
uncharacteristically—cautious in his own promotional language.
In a post on X, Altman noted that OpenAI had built more safeguards, warnings, and mitigations into this model than any of its predecessors. “We don’t know exactly what the impacts are going to be,” he continued, adding that “we think it’s important … that people adopt these tools carefully and slowly.” This was
a notable rhetorical departure from the unfettered hype that typically emanates from Silicon Valley. It was underscored by a warning from cybersecurity researcher Rachel Tobac, who suggested that users avoid the model for now: “Let experts work out the integration issues and build in safeguards before you cause a data breach, leak your sensitive photos, post client
personal data, or worse,” she wrote on X.
Indeed, OpenAI’s own system card for the model noted an accuracy rate below 95 percent across two common benchmarks intended to evaluate the model’s propensity for hallucination. OpenAI also noted that the model might, for instance, buy the wrong product or leak private data online—two
risks the company is trying to mitigate by training the model to ask users for confirmation before doing anything. In internal tests, however, the company noted that ChatGPT correctly confirmed its actions with the user only 91 percent of the time. OpenAI told The Verge that the model’s ability to
perform financial transactions has been restricted “for now.” Despite these factors, OpenAI is still releasing the product into the “wild,” according to Altman, so that it can “begin learning from contact with reality.” - Anthropic’s legal nightmare: Last month, the A.I. startup Anthropic secured a win in a major copyright lawsuit. Judge William Alsup ruled that the company, which is
eyeing another potential funding round that could value it at more than $100 billion, was within its legal rights to train its models on copyrighted books—a practice, Alsup ruled, protected by the fair use doctrine. However, the other aspect of his ruling—that the fair use doctrine did not cover the 7
million pirated books that Anthropic has copied and stored—was potentially much more explosive. Last Thursday, Alsup granted plaintiffs’ request for class-action certification, meaning that the three authors who brought the case can compile a list of other authors whose work was illegally pirated, with all
authors potentially “entitled to receive statutory damages.” That list must be filed by September 1.
James Grimmelmann, a professor of digital and information law at Cornell, told me that the two major outstanding questions in the case are how many authors will be in the certified class, and what the damages will be per author. Those will be left to the jury. “By framing the issues as he did, Alsup basically indicated that Anthropic is likely to be liable for the books it
downloaded from pirate libraries,” Grimmelmann said. In a post, Santa Clara University law professor Edward Lee noted that the range of potential statutory payments per infringed work could run from $750 to $30,000 in the best-case scenario for Anthropic—and $150,000 per work in the
worst.
If the final class includes 1 million infringed works, Anthropic could face between $750 million and $150 billion in statutory damages. “We respectfully disagree with the Court’s decision,” an Anthropic spokesperson told me, adding that the company is “exploring all avenues for review.” - The A.I. math wars: On Saturday morning, OpenAI researcher Alexander Wei
announced that an unreleased, “experimental” OpenAI model had achieved a gold-medal performance in the International Mathematical Olympiad, a competition where high schoolers around the world compete to solve six mind-numbingly difficult problems; the model solved five out of the six. Then, on Monday, Google DeepMind
said that a similarly “advanced,” unreleased iteration of its Gemini model achieved the same score. (Last year, DeepMind got four questions right, snagging a silver medal.)
Crucially, the I.M.O. did not validate the OpenAI model’s method: It’s unclear whether there
was a degree of human involvement, how much compute was used, or whether the results are reproducible. Similarly, DeepMind’s approach has not been independently evaluated, though the I.M.O. did certify their result. As UCLA math professor Terence Tao wrote: “In the absence of a controlled test methodology that was not self-selected by the competing teams, one should be
wary of making apples-to-apples comparisons between the performance of various A.I. models and the human contestants.”
For their part, both companies said their approaches were general, not task-specific, and were achieved without the use of additional tools or internet access—something that certainly seems impressive, though plenty of questions remain. It’s
not clear, for instance, whether the models submitted incorrect solutions to the more difficult sixth problem, or just didn’t attempt it at all. (In an extremely vague post, Wei said that the model “knew” it didn’t have the right answer for that sixth problem.) Ernest Ryu, a professor of applied mathematics at UCLA,
believes that this breakthrough could lead to advancements in math research, wherein researchers use L.L.M.s as veritable assistants to help tackle novel problems. But, as mathematician Kevin Buzzard
pointed out, whether these systems will get to that point remains a “big open question right now.”
|
And now for the main event…
|
|
|
A new multi-hundred-million-dollar partnership between Uber, Lucid, and Nuro aims to deploy
some 20,000 self-driving robotaxis across the U.S. over the next six years. But with a slate of complex technological hurdles and attendant safety challenges, can self-driving vehicles actually replace human drivers at scale in the near term?
|
|
|
Five years ago, after the National Transportation Safety Board
found Uber partially responsible for a fatal crash involving a self-driving car and a pedestrian in Arizona, the company temporarily shelved its fantasy of outfitting its fleet of vehicles with self-driving robotaxis. Now, the company is preparing to try again. Last week, Uber
announced a partnership with Lucid, an electric vehicle maker, and Nuro, a self-driving tech company, to deploy some 20,000 self-driving vehicles, built explicitly for Uber’s ride-hail platform, in “dozens of markets around the world” over the next six years. A regulatory
filing noted that production on these vehicles will begin next year. Meanwhile, Uber announced that the robotaxis will launch in a “major U.S. city next year.”
As part of the partnership, Uber will invest $300 million in Lucid and an undisclosed amount—somewhere in the “multi-hundred-millions”—in Nuro. When I asked Dr.
Jesse Kirkpatrick, the co-director of Mason Autonomy and Robotics Center at George Mason University, about the scale and pace of the rollout, he told me that “ambitious is an understatement. It seems to me likely more of a pipe dream, to be honest.”
Indeed, the trouble with self-driving cars deployed at mass scale is that the technology remains vulnerable to the same problem plaguing all A.I. applications: reliability. In this case, however, the issue isn’t
merely algorithmic reliability—it’s the reliability of the hardware that feeds those algorithms, too.
The Nuro Driver, for instance, features a suite of sensors that include multiple types of cameras, in addition to lidar, radar, and audio sensors. The software involves a series of overlapping A.I. models, each designed to enable autonomous mapping, perception, behavior, and controls.
The details of Nuro’s algorithms (their type, size, energy intensity, etcetera) haven’t been disclosed, though the company has said they’ve been trained on a combination of real-world data gathered over the past decade or so, along with closed-course testing and simulations. Like all other self-driving car companies, Nuro employs teams of human teleoperators who
can remotely take over a car if—or when—it gets into trouble, though researchers have pointed out that those remote operators are only genuinely useful if the cars operate at slow speeds. (According to Kirkpatrick, these
teleoperators are the “dirty secret” of the industry.)
Those layers of technology are designed for redundancy, which is vital when it comes to safe self-driving. But there are still plenty of potential failure points. On the hardware side, unusual lighting or severe weather can disrupt the sensory data coming from the cameras; rain or moisture can disrupt or entirely
disable lidar sensors; and radar systems can fail from environmental interference. Then, there are the well-known and documented (hallucinatory) flaws to the algorithms themselves.
Many of these failure points were on display during
Tesla’s robotaxi launch last month. This brought to mind a conversation I once had with Missy Cummings, director of the aforementioned Mason Autonomy and Robotics Center, who noted that, for self-driving vehicles, hallucinations are “a guaranteed outcome. They just simply can’t not hallucinate.” This doesn’t mean robotaxis are
unusable—only that they must be thoughtfully engineered to manage the inherent risks tied to all those failure points.
|
The dream of robotaxis, however, isn’t that they will merely be as safe as human driving—it’s that they’ll be
much safer. But it’s unclear whether that will be achievable, largely due to the potential scale at which they’ll be deployed and the number of vehicles already on the road. In 2023, the National Highway Traffic and Safety Administration reported a fatality rate of 1.26 per 100 million vehicle miles traveled; there were a total of 3.19 trillion miles traveled that year, and just
over 40,000 people died in car accidents. That was all spread across approximately 284 million cars and trucks that, on average, drove 11,000 miles each that year.
Waymo, the autonomous vehicle ride-hailing service already on the streets in Los Angeles, San Francisco, Phoenix, and Austin, recently surpassed a total of 100 million autonomous miles driven, all
without any major incidents. And while the company claims to be “safer” than human-driven vehicles, its total fleet consists of just 1,500 or so cars—a mere drop in the very large bucket of all the cars on the road today. In its 10-ish
years of operation, Nuro has racked up slightly more than a million autonomous miles without “an at-fault incident.” But again, Americans drive more than 3 trillion miles annually. As Cummings
said last year, we have nowhere near enough data to make statistically strong claims about the true safety of robotaxis.
According to Kirkpatrick, Uber’s ambitious rollout plans would level up the risk. At the scale that Uber seems to be targeting, he said, “the
exception can become the rule.” In other words, if 20,000 autonomous cars drive just 50 miles a day, that equals a million daily miles; in turn, that rare, one-in-a-million edge case that wasn’t included in the training data could occur on a daily basis. “Autonomous vehicles have to generalize to rare events with near-perfect reliability. At present, no end-to-end A.I. model has demonstrated capability to do that,” he said.
And the bigger a fleet gets, the greater the
“aggregation of risk”—meaning, for example, a software bug that gets pushed fleetwide could cause a lot of harm without the proper safeguards. To deal with that risk, operators should take a page out of the aviation industry’s playbook, and implement rigorous, standardized software-quality assessment, real-time monitoring, and rollback systems. “But all of that has to be built intentionally. At present, that’s not what’s going on,” Kirkpatrick said. “Instead, these systems are
operating in what is effectively a test environment—a civilian test environment that can become very dangerous.”
|
At this stage, so much about Uber’s plan remains decidedly unclear. It’s unclear how the partners will split
up safety-observability responsibilities, or what that will look like in practice; it’s unknown how many teleoperators there will be, or who will employ them; it’s unknown how much these robotaxis will cost to build and maintain, or which geographies Uber is targeting first, or the constraints that will be applied to the fledgling fleet.
Finally, it’s unclear whether the rollout will coincide with a reduction in human-driven Ubers, or what the robo-Ubers will cost, or whether
this partnership will in some way impact Uber’s current partnership with Waymo. These are questions, according to an Uber spokesperson, that are “best answered as we get closer to the rollout in 2026.” (Nuro did not respond to a request for comment.)
Activist groups, community groups,
first responders, and unions have all advocated against the rise of the robotaxis. And rolling them out without the rigorous testing necessary could come back to haunt Silicon Valley if
accidents end up stalling out the industry. Dr. Zach Asher, director of the Energy Efficient and Autonomous Vehicles Lab at Western Michigan University, told me that, generally speaking, more competition in this space is a good thing—but that he’s “worried, because aggressive expansion and dangerous activities, such as driving on busy roads, don’t mix.” (Currently, no robotaxi
operates on highways in the U.S., although it’s an area that Waymo wants to expand to.) “All it takes is a major public accident to set the whole industry back a few years
again.”
|
That’s all for today. If you’ve taken rides in a robotaxi—or have had some sort of encounter with one in the
wild—I’d love to hear all about it…
I’ll see you Thursday. Ian
|
|
|
Join Puck’s chief political columnist, John Heilemann, as he roams the corridors of power and influence in America on this
twice-weekly interview show, taking you beyond the headlines with the people who shape our culture: icons and up-and-comers, incumbents and insurgents, moguls and machers in the overlapping worlds of politics, entertainment, tech, business, sports, media, and beyond. The conversations are rich and revealing, unrehearsed and unexpected… and reliably impolitic. A Puck-Audacy joint, new episodes drop every Wednesday and Friday.
|
|
|
A professional-grade rundown on the business of sports from John Ourand, the industry’s preeminent journalist, covering the
leagues, players, agencies, media deals, and the egos fueling it all.
|
|
|
Need help? Review our FAQ page or contact us for assistance. For brand partnerships, email ads@puck.news. You received this email because you signed up to receive emails from Puck, or as part of your Puck account associated with {{customer.email}}. To stop receiving this newsletter and/or manage all your email preferences, click here.
|
Puck is published by Heat Media LLC. 107 Greenwich St, New York, NY 10006
|
|
|
|