By Kylie Robison
Copyright wired
Just over a hundred visitors had crowded into an office building in the Duboce Triangle neighborhood for a showdown that would pit teams armed with AI coding tools against those made up of only humans (all were asked to ditch their shoes at the door). The hackathon was dubbed “Man vs. Machine,” and its goal was to test whether AI really does help people code faster—and better.
Roughly 37 groups were randomly assigned “human” or “AI-supported.” Later, an organizer told me several people dropped out after being placed on the human team. A panel of judges would rank projects based on four criteria: creativity, how useful it might be in the real world, technical impressiveness, and execution. Only six teams would make it to the demo. The winning team would earn a $12,500 cash prize and API credits from OpenAI and Anthropic. Second place would get $2,500.
AI coding has been somewhat of a lightning rod in Silicon Valley. While fears of an engineering apocalypse abound, a new study from METR—an AI research nonprofit that cohosted the hackathon—found that AI tools actually slowed experienced open source developers by 19 percent.
The weekend hackathon was meant to take METR’s research a step further. While the study looked at experienced coders working on existing codebases, at this event, some of the participants had very little coding experience and everyone would be proposing new projects.
Many studies on developer productivity use metrics like the number of pull requests or lines of code written, says Joel Becker, a member of the technical staff at METR. But these numbers can be hard to interpret. Writing more code or sending off more pull requests isn’t always better. Similarly, when we look at AI performance, even if a model scores 80 or 90 percent on a given benchmark, it’s not always clear what that means in terms of its practical abilities.
Becker bets the machine will win.
Crunch Time
In a Slack channel for the event, contestants pitched ideas to try to attract potential teammates: an AI tool for pianists to get performance feedback, an app to track what you’re reading, and a platform to help neighbors connect.
One contestant, Arushi Agastwar, is a student at Stanford studying AI ethics. She first started coding in eighth grade but has since taken a break to focus on evaluating AI’s impact on society. Agastwar was randomly selected to be on the human team, and she decided to build a framework that evaluates sycophancy (like the agreeableness that plagued OpenAI’s GPT-4o) in AI models.
“I have a feeling that some of the ideas that are going to be coming out from the ‘man’ teams are going to be really profound, and I’m hopeful that the demo aspect is not the only thing that the judges will be impressed by,” Agastwar tells me. Her initial bet was that a man team, i.e., one not using AI, would win. But several hours into the hackathon, she wasn’t so sure that she could complete the task by the 6:30 PM deadline.
Then there’s Eric Chong, a 37-year-old who has a background in dentistry and previously cofounded a startup that simplifies medical billing for dentists. He was placed on the “machine” team.
“I’m gonna be honest and say I’m extremely relieved to be on the machine team,” Chong says.
At the hackathon, Chong was building software that uses voice and face recognition to detect autism. Of course, my first question was: Wouldn’t there be a wealth of issues with this, like biased data leading to false positives?
“Short answer, yes,” Chong says. “I think that there are some false positives that may come out, but I think that with voice and with facial expression, I think we could actually improve the accuracy of early detection.”
The AGI ‘Tacover’
The coworking space, like many AI-related things in San Francisco, has ties to effective altruism.
If you’re not familiar with the movement through the bombshell fraud headlines, it seeks to maximize the good that can be done using participants’ time, money, and resources. The day after this event, the event space hosted a discussion about how to leverage YouTube “to communicate important ideas like why people should eat less meat.”
On the fourth floor of the building, flyers covered the walls—“AI 2027: Will AGI Tacover” shows a bulletin for a taco party that recently passed, another titled “Pro-Animal Coworking” provides no other context.
A half hour before the submission deadline, coders munched vegan meatball subs from Ike’s and rushed to finish up their projects. One floor down, the judges started to arrive: Brian Fioca and Shyamal Hitesh Anadkat from OpenAI’s Applied AI team, Marius Buleandra from Anthropic’s Applied AI team, and Varin Nair, an engineer from the AI startup Factory (which is also cohosting the event).
As the judging kicked off, a member of the METR team, Nate Rush, showed me an Excel table that tracked contestant scores, with AI-powered groups colored green and human projects colored red. Each group moved up and down the list as the judges entered their decisions. “Do you see it?” he asked me. No, I don’t—the mishmash of colors showed no clear winner even half an hour into the judging. That was his point. Much to everyone’s surprise, man versus machine was a close race.
In the end, the finalists were evenly split: three from the “man” side and three from the “machine.” After each demo, the crowd was asked to raise their hands and guess whether the team had used AI.
First up was ViewSense, a tool designed to help visually impaired people navigate their surroundings by transcribing live videofeeds into text for a screen reader to read out loud. Given the short build time, it was technically impressive, and 60 percent of the room (by the emcee’s count) believed it used AI. It didn’t.
Next was a team that built a platform for designing websites with pen and paper, using a camera to track sketches in real time—no AI involved in the coding process. The pianist project advanced to the finals with a system that let users upload piano sessions for AI-generated feedback; it was on the machine side. Another team showcased a tool that generates heat maps of code changes: critical security issues show up in red, while routine edits appear in green. This one did use AI.
My favorite project was, of course, a proofreading tool for writers.
“We love reading books, and we think that the AI era that is coming is trying to fight with all the human writers, trying to take their jobs,” a member of the group explained. “And instead of fighting it with some blockages, we decided to make it easier for writers to write good books.”
They demo a system that, as you write, automatically tracks characters, traits, and relationships. If you contradict yourself—for example, saying two characters are best friends in one chapter but enemies in another—it flags the inconsistency. This team did not use AI.
The crowd, sitting cross-legged on the floor, did a brief drum roll. “Hands up if you think the overall winner was an AI-allowed team,” instructed Becker. My hand sprang up. In the end, AI took the top spot. By his count, 80 percent of the room had guessed correctly.
The $12,500 cash prize went to the code-review heat map, which used AI. The humans weren’t far behind—second place went to the writing tool.
David vs. Goliath
The second-place champions squeezed together on a bench across from me. Michał Warda, Dawid Kiełbasa, Marta Szczepaniak, and Paweł Sierant were a team of startup founders from Poland, visiting San Francisco for just a few months to experience the AI hype. They nearly dropped out an hour before the deadline, arguing under the pressure of coding without AI, but they pushed their demo minutes before submissions closed.
“We don’t usually argue. But today was very tense, the lack of AI tooling. We’ve been programmers for a lot of time,” Warda tells me.
By the end, they were glad they’d stuck it out. “During the coding, yes, we wished we were on the machine team,” Szczepaniak says. Warda jumps in to add that if they weren’t fighting on behalf of humans, they might not have won a prize at all.
Then there were the winners: Konstantin Wohlwend, Aman Manvattira Ganapathy, and Bilal Godil. Wohlwend and Godil run a startup called Stack Auth. This is their third hackathon together. Ganapathy is an engineering intern at AppFolio.
They hadn’t expected to win, and they swore they’d rather have been on the human side for the thrill of playing David against Goliath. Still, they knew the truth: Man plus machine has the edge.
“You always want to believe in the man,” Wohlwend tells me. “But in this kind of format, the machine will almost always win.”
This is an edition of Kylie Robison’s Model Behavior newsletter. Read previous newsletters here.