We need a clearer framework for AI-assisted contributions to open source
We need a clearer framework for AI-assisted contributions to open source
Homepage   /    other   /    We need a clearer framework for AI-assisted contributions to open source

We need a clearer framework for AI-assisted contributions to open source

Andrej Karpathy 🕒︎ 2025-11-05

Copyright samsaffron

We need a clearer framework for AI-assisted contributions to open source

As both developers and stewards of significant open source projects, we’re watching AI coding tools create a new problem for open source maintainers. AI assistants like GitHub Copilot, Cursor, Codex, and Claude can now generate hundreds of lines of code in minutes. This is genuinely useful; but it has an unintended consequence: reviewing machine generated code is very costly. The core issue: AI tools have made code generation cheap, but they haven’t made code review cheap. Every incomplete PR consumes maintainer attention that could go toward ready-to-merge contributions. At Discourse, we’re already seeing this accelerating across our contributor community. In the next year, every engineer maintaining open source projects will face the same challenge. We need a clearer framework for AI-assisted contributions that acknowledges the reality of limited maintainer time. A binary system works extremely well here. On one side there are prototypes that simply demonstrate an idea. On the other side there are ready for review PRs that meet a project’s contribution guidelines and are ready for human review. The lack of proper labeling and rules is destructive to the software ecosystem The new tooling is making it trivial to create a change set and lob it over the fence. It can introduce a perverse system where project maintainers spend disproportionate effort reviewing lopsided AI generated code that took seconds for contributors to create and now will take many hours to review. This can be frustrating, time consuming and demotivating. On one side there is a contributor who spent a few minutes fiddling with AI prompts, on the other side you have an engineer that needs to spend many hours or even days deciphering alien intelligence. This is not sustainable and is extremely destructive. The prototype AI coding agents such as Claude Code, Codex, Cursor CLI and more have unlocked the ability to ship a “new kind” of change set, the prototype. The prototype is a live demo. It does not meet a project’s coding standards. It is not code you vouch for or guarantee is good. It lacks tests, may contain security issues and most likely would introduce an enormous amount of technical debt if merged as is. That said it is a living demo that can help make an idea feel more real. It is also enormously fun. Think of it as a delightful movie set. Prototypes, especially on projects such as Discourse where enabling tooling exists are incredibly easy to explore using tools like dv. Prototypes are great vehicles for exploring ideas. In fact you can ship multiple prototypes that demonstrate completely different solutions to a single problem which help decide on the best approach. Prototypes, video demos and simple visual mockups are great companions. The prototype has the advantage that you can play with it and properly explore the behavior of a change. The video is faster to consume. Sometimes you may want them all. If you are vibe coding and prototyping there are some clear rules you should follow Don’t send pull requests (not even drafts), instead lean on branches to share your machine generated code. Share a short video AND/OR links to a branch AND/OR quotes of particular interesting code from the prototype in issues / or forum posts. Show all your cards, explain you were exploring an idea using AI tooling, so people know the nature of the change you are sharing. Maybe you will be lucky and an idea you had will get buy-in, maybe someone else may want to invest the time to drive a prototype into a production PR. When should you prototype? Prototyping is fun and incredibly accessible. Anyone can do it using local coding agents, or even coding agents on the cloud such as Jules, Codex cloud, Cursor Cloud, Lovable, v0 and many many more. This heavily lowers the bar needed for prototyping. Product managers can prototype, CEOs can prototype, designers can prototype, etc. However, this new fun that opens a new series of questions you should explore with your team. When is a prototype appropriate? How do designers feel about them? Are they distracting? (are links to the source code too tempting)? Do they take away from human creativity? How should we label and share prototypes? Is a prototype forcing an idea to jump the queue? When you introduce prototyping into your company you need to negotiate these questions carefully and form internal consensus, otherwise you risk creating large internal attitude divides and resentment. The value of the prototype Prototypes, what are they good for? Absolutely something. I find prototypes incredibly helpful in my general development practices. Grep on steroids. I love that prototypes often act as a way of searching through our large code base isolating all the little areas that may need changing to achieve a change I love communicating in paragraphs, but I am also a visual communicator. I love how easy a well constructed prototype can communicate a design idea I have, despite me not being that good in Figma. I love that there is something to play with. It often surfaces many concerns that could have been missed by a spec. The best prototype is tested, during the test you discover many tiny things that are just impossible to guess upfront. The crazy code LLMs generate is often interesting to me, it can sometimes challenge some of my thinking. The prototype - a maintainers survival guide Sadly, as the year progresses, I expect many open source projects to receive many prototype level PRs. Not everyone would have read this blog post or even agree with it. As a maintainer dealing with external contributions: Protect yourself and your time. Timebox initial reviews of large change sets, focus on determining if it was “vibe coded” vs leaving 100 comments on machine generated code that took minutes to generate. Develop an etiquette for dealing with prototypes pretending to be PRs. Point people at contribution guidelines, give people a different outlet. “I am closing this but this is interesting, head over to our forum/issues to discuss” Don’t feel bad about closing a vibe coded, unreviewed, prototype PR! The ready to review PR A ready to review PR is the traditional PRs we submit. We reviewed all the machine generated code and vouch for all of it. We ran the tests and like the tests, we like the code structure, we read every single line of code carefully we also made sure the PR meets a project’s guidelines. All the crazy code agents generated along the way has been fixed, we are happy to stamp our very own personal brand on the code. Projects tend to have a large set of rules around code quality, code organisation, testing and more. We may have used AI assistance to generate a ready to review PR, fundamentally, though this does not matter, we vouch for the code and stand behind it meeting both our brand and a project’s guidelines. The distance from a prototype to a ready to review PR can be deceptively vast. There may be days of engineering taking a complex prototype and making it production ready. This large distance was communicated as well by Andrej Karpathy in the Dwarkesh Podcast. For some kinds of tasks and jobs and so on, there’s a very large demo-to-product gap where the demo is very easy, but the product is very hard. For example, in software engineering, I do think that property does exist. For a lot of vibe coding, it doesn’t. But if you’re writing actual production-grade code, that property should exist, because any kind of mistake leads to a security vulnerability or something like that. Veracode survey found that only 55% of generation tasks resulted in secure code. (source). Our models are getting better by the day, and everything really depends on an enormous amount of parameters, but the core message that LLMs can and do generate insecure code, stands. On alien intelligence The root cause for the distance between project guidelines and a prototype is AI alien intelligence. Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison. I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time. Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to. Playing to alien intelligence strength, the prototype Over the past few months I have been playing a lot with AI agents. One project I am particularly proud of is dv. It is a container orchestrator for Discourse, that makes it easy to use various AI agents with Discourse. I will often run multiple complete and different throwaway Discourse environments on my machines to explore various features. This type of tooling excels at vibe engineering prototypes. Interestingly dv was mostly built using AI agents with very little human intervention, some of the code is a bit off brand, that said unlike Discourse or many of the other open source gems I maintain it is a toy project. Back on topic, dv has been a great factory for prototypes on Discourse. This has been wonderful for me. I have been able to explore many ideas while catching up on my emails and discussions on various Discourse sites. On banning AI contributions, prototypes and similar Firstly you must be respectful of the rules any project you contribute has, seek them out and read them prior to contributing. For example: Cloud hypervisor says no AI generated code to avoid licensing risks. That said, there is a trend among many developers of banning AI. Some go so far as to say “AI not welcome here” find another project. This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable. The new LLM tooling can be used in tremendous amounts of ways including simple code reviews and simple renamings within a file, to complete change set architecture. Given the enormous mess and diversity here I think the healthiest approach is to set clear expectations. If I am submitting a PR it should match my brand and be code I vouch for. As engineers it is our role to properly label our changes. Is our change ready for human review or is it simply a fun exploration of the problem space? Why is this important? Human code review is increasingly becoming a primary bottleneck in software engineering. We need to be respectful of people’s time and protect our own engineering brands. Prototype are fun, they can teach us a lot about a problem space. But when it comes to sending contributions to a project, treat all code as code you wrote, put your stamp of ownership and approval on whatever you build and only then send a PR you vouch for.

Guess You Like

Indian markets open higher on positive US-China trade talks
Indian markets open higher on positive US-China trade talks
Indian stock markets opened on...
2025-11-01