{{ 'now' | timezone: 'America/New_York' | date: '%b %d, %Y' }}
|
|
|
Welcome to The Hidden Layer. I’m Ian Krietzberg, coming off a fun weekend at the
Middleburg Film Festival in Virginia, where I moderated a very interesting panel on the clear influence of Yasujiro Ozu’s postwar oeuvre on Michael Bay’s Transformers films. Just kidding… I was there chatting about A.I. and the film industry—a topic filled with endless hypotheses and anxieties.
Also, my partner Julia Ioffe’s book, Motherland, is officially out today, and it’s already a finalist for the
National Book Award. It’s great, she’s great, and you should definitely check it out.
In today’s issue, a close look at a recent effort to actually define artificial
general intelligence. As multiple researchers have told me over the years, the discourse around A.G.I. has much more to do with the sociopolitical zeitgeist than any sort of science—but the hyperscalers have, nonetheless, deployed billions over the years chasing its sci-fi-esque promise. Plus, notes on an A.I. “reckoning” for venture capitalists and why ChatGPT seems to be plateauing in Europe.
Also discussed today: Dan Hendrycks, Gary Marcus,
Yacine Jernite, John McCarthy, Melanie Mitchell, Suresh Venkatasubramanian, Benjamin Recht, Mira Murati, Joe Kaiser, and many more…
|
- A
“reckoning” for venture capital: One of the few constants in the years since ChatGPT ignited this A.I. bull cycle has been the volume of venture capital investment that has deluged the industry. According to Crunchbase data, A.I. firms raised nearly $61 billion in 2023—a number that was surpassed in the
first quarter of 2025 alone. And it’s not just the major developers: Some 33 A.I. startups completed rounds of $100 million or more this year. In July, for example, Thinking Machine Labs raised $2 billion at a $12 billion valuation, despite having no product or revenue; all it had was Mira Murati, an OpenAI
co-founder and the company’s former C.T.O. And last week, Business Insider reported that former OpenAI product manager Angela Jiang was in talks to raise $10 million at a $50 million valuation for her own startup, Worktrace AI, which is also backed by Murati.
Some of the deals are “getting nutty,”
said Joe Kaiser, the C.E.O. and managing director of venture firm Mercato Partners. “The valuations aren’t grounded in financials. There are a lot of V.C.s that are having this existential fear that if they don't pick a winner, they won’t exist five years from now.”
Sure, venture capital has always been a game of risk. But many V.C.s are taking on way more risk than usual, which could mean much more unpredictable returns down the road. Increasingly, Kaiser said, L.P.s are
finding that they can’t recycle their money into new investments because they’re not getting the distributions that they’d counted on several years prior.
Kaiser is still bullish on A.I., but he’s worried that some V.C.s might get washed out. “I think that the music is going to stop soon,” he told me. “We have so many firms that have come to market, raised funds, are making huge bets on A.I. that don’t have discipline around valuation. There has to be a reckoning moment.” - ChatGPT “stalls out” in Europe: A new report from Deutsche Bank found that OpenAI has been struggling since May to convince more Europeans to shell out for ChatGPT. Spending on the service, according to the report, has effectively “stalled out” over the past few months. The findings relied on data from
third-party financial institutions in Germany, France, Italy, Spain, and the U.K.—countries that together account for roughly 15 percent of global ChatGPT usage. (The U.S. accounts for 17 percent.)
Of course, ChatGPT is still growing overall. OpenAI currently boasts around 20 million paid subs, out of a total global user base of about 800 million. And in Europe, spending on ChatGPT subs has already surpassed Disney+ and reached about half the amount spent on Spotify and a quarter of
Netflix. “At this year’s overall growth rate,” the report states, ChatGPT “would overtake Spotify in May 2027 and Netflix in February 2028.” Of course, OpenAI has a private market valuation of $500 billion, which is about par with Netflix and more than three times larger than Spotify.
|
Hallucination of the
Week: Deepfake M.L.K.
|
Alas, it didn’t take long for Sora users to take advantage of the model’s hyperrealism to generate a variety
of clips abusing the voice and likeness of Martin Luther King Jr. At the request of his estate, OpenAI has now prohibited A.I. generations of King while it establishes stronger guardrails. (SAG-AFTRA is also engaged in their own deepfake battle with the service.)
Runner-up: The Republican Senate campaign arm released an ad featuring an A.I.-generated video of Chuck Schumer, smiling maniacally, repeating a quote he gave to Punchbowl. There’s a
disclaimer that the image and audio is fake—the quote itself is real—but it’s small and buried in the corner. N.R.S.C. comms director Joanna Rodriguez defended the move, saying: “A.I. is here and not going anywhere. Adapt & win or pearl clutch & lose.”
And now for the main event…
|
|
|
The search for a non-B.S. definition of artificial general intelligence remains an industry
goal, and the noble pursuit of Dan Hendrycks’s recent paper—or “manifesto,” as some have called it. Does it pass the smell test?
|
|
|
For all the investor excitement, industry posturing, and global arms races around “artificial general
intelligence”—or true machine intelligence, a nebulous target that would make modern chatbots seem downright primitive—the concept still lacks an agreed-upon definition. So, while scientists debate terminology, the big A.I. companies have volunteered their own criteria for a theoretical benchmark that is largely inseparable from their marketing and funding efforts. OpenAI, for instance, defines
A.G.I. as highly autonomous systems that “outperform humans at most economically valuable work.” Anthropic, meanwhile, is working on what it calls “Powerful A.I.,” which it defines, quite simply, as systems that are substantially better than today’s systems. Others, like Meta, are chasing “superintelligence”
without any sort of defined finish line.
It’s unclear when, if ever, or how, if at all, large language models might become “generally intelligent,” or what it would look like if they did. But Dan Hendrycks, an advisor to xAI and Scale AI, and the executive director of the Center for AI Safety, nevertheless set out to establish a universal definition for A.G.I. and benchmark it. In a recent paper—described to me by several researchers, Jerry Maguire–style, as more of a “manifesto”—he conscripted a few dozen prominent players in the field to give their stamp of approval and work toward a shared definition. (Gary Marcus, a cognitive scientist often ridiculed for his criticisms of L.L.M.s, is listed as a co-author on the paper.)
|
The notion that machine intelligence is just around the bend has existed for decades. In
1956, computer scientist John McCarthy proposed a summer research project on “artificial intelligence”—a marketing term he coined—in which he hypothesized “that every aspect of learning or any other feature of intelligence can, in principle, be so precisely described that a machine can be made to simulate it.” In 1957, the interdisciplinary
researcher and Nobel laureate Herbert Simon claimed, without evidence, that “there are now in the world machines that think, that learn, and that create.” A decade later, MIT professor Marvin Minsky, who was involved in McCarthy’s proposal,
claimed that “within a generation, … the problem of creating ‘artificial intelligence’ will be substantially solved.”
Obviously, none of these predictions have borne fruit. But ever since OpenAI reignited the A.I. industry, there’s been a renewed effort to realize these hypothetical advanced systems, necessitating some sort of working definition for what they’re all aiming to
achieve. In his recent paper, Hendrycks offered the following: “an A.I. that can match or exceed the cognitive versatility and proficiency of a well-educated adult.” (The authors never clarify what kind of system they’re talking about; i.e., an L.L.M., a symbolic system, or something else.) Sounds simple enough, but how do we define and benchmark the “cognitive versatility” of a well-educated adult? And how can we apply that to an A.I. system? (As Yacine Jernite, the
head of machine learning and society at Hugging Face, told me, “Defining A.G.I. is always a narcissistic exercise.”)
At the heart of Hendrycks’s paper is the Cattell-Horn-Carroll theory of intelligence, which is one of the more empirically supported and widely—though not universally—accepted theories
for understanding human cognition. C.H.C. describes “intelligence” as a hierarchy of cognitive abilities, which Hendrycks breaks down into 10 components, including general knowledge, writing and reading ability, auditory processing, etcetera.
Hendrycks doesn’t linger on the question of whether this framework for human cognition is the right one, though he does acknowledge that it’s “not exhaustive.” To establish the basis for their A.I. benchmark—or as Hendrycks terms it, an A.G.I.
“score”—the authors assigned a weight of 10 percent to each category, where a score of 100 percent meets the definition for A.G.I. Under this rubric, GPT-4 received a score of 27 percent (8 percent for knowledge, 6 percent for reading and writing, and 0 percent for on-the-spot reasoning). Meanwhile, GPT-5 earned a score of 58 percent via improvements in several categories, including reasoning and reading/writing. To test these models, Hendrycks either posed a prompt or checked whether they had
exceeded a certain score on a preexisting benchmark.
Hendrycks told me that his goal was to “reflect uncertainty and prioritize breadth.” For his part, Marcus acknowledged how arbitrary this scoring might seem. “I see the paper as setting a preliminary line in the sand,” he told me. “It’s not perfect, but it puts the emphasis where it should be: on the flexibility and breadth of cognition, rather than on some economic definition in terms of profits or number of
jobs taken.”
Of course, while benchmarks have become a core focus for the industry, they don’t necessarily tell us much about a system. As Dr. Melanie Mitchell, a prominent researcher in the space, has pointed out, without knowing what data a model is trained on, it’s impossible to tell whether it is simply
memorizing patterns found in its training data, or genuinely reasoning. (A constant flow of research papers in recent months has pointed to the former, at least when it comes to L.L.M.s.) When I asked Hendrycks how he could create a
robust benchmark without accounting for a model’s training data, he said, “I think for this, we actually do much, much better than basically anything out there.”
|
You Manage What You Measure
|
I got in touch with a number of researchers to unpack the Hendrycks approach, and none was exactly enamored.
Dr. Nathaniel Daw, a professor of computational and theoretical neuroscience at Princeton, noted that “if we can put a score on something, then we are good at driving that score up. The hard part is figuring out how to put a score on what we actually want computers to do.” He explained that there’s a widespread effort to build tests that actually capture those functions, and that this paper seems to operate in that spirit. But he cautioned not to read the results to mean that
“we’re 58 percent of the way to A.G.I. or whatever.”
Dr. Suresh Venkatasubramanian, the director of Brown University’s Center for Technological Responsibility, Reimagination, and Redesign, referred to the paper as little more than a summation of 10 different benchmarks, and described it as scientifically underwhelming at best. His biggest question about the document: Who is it intended to serve? He suggested two groups: the major A.I. developers, who are interested in
securing bragging rights; and others—including the Center for AI Safety, which has received millions in funding from Open Philanthropy—working to influence the path of A.I. regulation. “If you have a definition of A.G.I. that you can point to,” he suggested, regulators might start setting regulatory thresholds around an “A.G.I. score” that’s essentially arbitrary. “You could have something that crosses 20
percent that needs regulation; you could have something that crosses 80 percent that doesn’t need regulation,” he said. “It misses the whole point of how we should think about regulating A.I. systems.”
Indeed, when I spoke with Hendrycks, he noted that “the original motivation is primarily because in policy discussions, if the word A.G.I. is brought up, it’s always very unclear what that means, and so it causes the conversation to devolve quite quickly.” Meanwhile, for
Venkatasubramanian, attempting to establish a definition for A.G.I. has never felt like a “helpful goal for scientists.” This document, he said, pushes the field in the wrong direction. “I think our ability to have a scientific discussion about the value and contributions of these tools is being corrupted by the huge amounts of money sloshing around,” he added. “This document has to be viewed in that context.”
For others, such as Dr. Benjamin Recht, an award-winning
computer scientist and professor at the University of California, Berkeley, the entire effort—this story included—is a waste of time. “Writing an article about Dan Hendrycks’s perpetual train of bullshit, even if you explain why it’s bullshit, only makes him stronger,” he said. “Feel free to print that.”
|
That’s all for today. I’ll see you Thursday.
Ian
|
|
|
Join Emmy Award-winning journalist Peter Hamby, along with the team of expert journalists at Puck, as they let you in on the
conversations insiders are having across the four corners of power in America: Wall Street, Washington, Silicon Valley, and Hollywood. Presented in partnership with Audacy, new episodes publish daily, Monday through Friday.
|
|
|
A professional-grade rundown on the business of sports from John Ourand, the industry’s preeminent journalist, covering the
leagues, players, agencies, media deals, and the egos fueling it all.
|
|
|
Need help? Review our FAQ page or contact us for assistance. For brand partnerships, email ads@puck.news. You received this email because you signed up to receive emails from Puck, or as part of your Puck account associated with {{customer.email}}. To stop receiving this newsletter and/or manage all your email preferences, click here.
|
Puck is published by Heat Media LLC. 107 Greenwich St, New York, NY 10006
|
|
|
|