Following on from last week’s letter, where I ‘vibe coded’ a bare metal program in C, this week I have ‘chat coded’ a moderately complicated front end from raw HTML/JS into React and started a shell in Go which is intended to have the usability of Fish shell while maintaining POSIX compatibility.
I’m not the world’s best frontend engineer so I managed to do in a few hours what would have previously taken me months. Although there are hidden costs which I’ll get to in a moment. The shell experiment did not go so well.
I wouldn’t describe this as vibe coding, which I hit hard limits with. Instead, I’m going to call this ‘chat coding’ because I’m thinking much harder about what is being written, rather than just going with the vibes. The term chat coding has been floating around for a while, but since Andrej Kaparthy coined (or popularized) vibe coding it has taken on a new distinction.
With pure vibe coding I found the AI consistently makes poor strategic decisions. Leading to huge files that are highly duplicated and lots of functions with pointless code in. Eventually the AI can’t progress except by creating completely new code and hooking that into what already exists. The resulting code is about as intelligible as the product of a genetic algorithm.
In some cases the code reaches such a level of mess that the code agent goes wild and just produces yet more nonsense. This is typical of what happens to LLMs when the input is too far outside of their training data. Which is amusing when it’s the LLM which produced what is now being input.
I think there are (expensive) ways that coding agents could mitigate this however we’re not there yet. So in the meantime the user has to provide controls to prevent this from happening.
At this stage that means looking at and understanding the code to some extent. Then providing the AI with constant feedback. Also taking time to think about or chat with the AI regarding the overall structure of the program.
Generally speaking, coding agents will make the laziest, most direct changes to achieve whatever you’ve given them. It’s easy to see why; when they deviate from this, becoming more proactive, it’s often disastrous.
Constantly bullying and directing the AI to get the result you want, is not nearly as fun as vibe coding. Neither is it more satisfying than writing code by hand and having complete control.
It’s an unhappy middle ground, but for tasks lacking in novelty and not requiring strict security guarantees, it is so much faster. The major reasons for which I think are quite mundane.
I feel like for a lot of programming tasks the limiting factors are typing and reading speed.
That is, how fast you can read the documentation and existing code to absorb the exact details of how it was constructed. Followed by how fast you can type in the code to use whatever combination of features you need.
For an experienced software developer, doing routine development, not much deep thought is required. It only takes a second to form an abstract idea of how the code should work. Most of the time is taken hammering out the routine details which follow standard patterns.
Any gap between thinking of what you want to write and writing it, is an opportunity to get distracted. In my case it is an opportunity to get bored and start complicating things to make it more interesting. So even if the percentage of time spent typing is small it’s still an important factor.
The amount of time spent typing can be reduced by asking an LLM to write the code you want, because even though you’re still typing (unless you’re using speech-to-text), the key combinations are easier and some of the details can be inferred by the LLM.
If you’re able to find the right balance, where you’re not asking the LLM to exceed its capabilities, then you can offload a lot of reading, writing, and editing at a reasonable cost.
However for some tasks I think it’s unlikely you’ll be able to find that balance. For the shell program for instance, while I did achieve some nice results quickly, the LLM overstepped and tried to do stuff that wasn’t gonna work. It then started piling hacks on top of hacks to try and fix it. In fact it quite often left a TODO style comment saying this is tricky, so I’ll just do this hack for now.
It seemed almost every detail of the program confused it, but in particular when I started trying to use the AST to highlight the syntax. Although to be fair it’s not just the LLM that was getting confused.
While I wouldn’t rule out using chat coding for elements of this shell Project, clearly some parts need to be written manually with special care. Trying to get an LLM to write them is partially a battle with the LLM and an expensive battle at that.
I have to say though, that I’m getting more and more excited by the thought of not having to write code by hand. To be clear, I enjoy being a competent Vim and Emacs user; I deliberately trained myself to touch type using the standard 10 finger method. I won’t be happy to never use these skills again, but I will be happy to have the option of not using them.
I like the idea of being able to walk around while writing code or doing research. Keyboards aren’t exactly a natural phenomena after all and needing to sit at one constrains my work environment.
In the past I’ve not used speech to text tools much because in the past they were usually rubbish. These days though they’re getting very good, which is in no small part due to LLMs and the underlying advances in machine learning.
Editing some text using speech commands is quite difficult if you have to specify the exact changes required. However if you can casually describe what is needed and there is a layer which translates this into the exact edits required, then it becomes a lot easier.
So today chat coding has a number of down sides that annoy me a lot. However the upsides make it difficult to ignore and in the future I see a number of possible evolutionary paths that I am positively excited about. Including that it becomes a superior alternative to the keyboard and mouse. If this could be the case for software development it would have deep implications for how everyone interfaces with computers.