π Managing Software Engineers in the AI Era
Two weeks ago I became a manager π¨βπΌ again. Throughout my career, I have managed several teams. I was even a director of software development a couple of times. But, I always stayed close to the code π», and remained hands-on. This time may be different. I’ll tell you why. But first, let me introduce my team: Devin, Claude and OpenHands… π€
βUnfortunately, humans have a long history of trying to fix their engineering mistakes with more engineering mistakes!β ~ Steven Magee
Yes, you guessed it. I am managing a team of AI software engineers π€―.
π€ What is an AI software engineer? π€
Now, AI software engineers are different from AI coding assistants like Copilot, Cursor or Windsurf. The AI coding assistants are integrated into your IDE (or come with their own IDE) and help you write code. The AI software engineers are a different breed π§¬. You give them a task and they go ahead and do the entire software development life cycle β come up with a plan, write the code, run the tests, iterate and eventually commit and push to source control. You can interact and steer them along the way or help them out of a tight spot, but the idea is that they are by and large autonomous.
This is a classic example of AI agents π΅οΈββοΈ.
π₯ How good are these so-called AI software engineers? π₯
They are pretty good! They can all understand decent code bases, come up with a plan based on pretty vague requirements, write correct code, unit and integration tests and fix their own errors. But, they are not perfect. They make mistakes and the larger the task or the more ambiguous, they can all start spinning in circles π.
For example, I asked Devin to find and fix all the warnings in a large Java code base. He did a great job identifying
all the warnings (not a big deal, just run gradlew build
and check the output).
But when it got around to fixing all the warnings, it got overwhelmed and started making lots of changes without
converging. π§
In the end I shut him down and asked him to just give me a list of the warnings and a plan to fix each one.
I then created a ticket for each warning and assigned them to Devin.
π§ͺ What’s the Secret Sauce? π§ͺ
The secret sauce is definitely the underlying LLM π§ . I have used Claude Sonnet 3.7 with all the agents. It is pretty impressive what it can do with very little guidance. All the agents demonstrated similar capabilities when using the same underlying model. I’m pretty sure that whatever smarts the AI agent brings to the table is pretty insignificant. It might have been more prominent in the early days (AKA a few months ago) when the LLMs needed more help. These days you can just throw together a few tools like web search, file and git access and the LLM does the rest π.
That said, the software AI engineers provide a convenient interface to the LLMs focused on software development.
π οΈ How to make AI software engineers work for you? π
So, the LLMs are great, but in the end they’ll make decisions you might not like. They will guess (AKA hallucinate) if they don’t have all the information and they will fail in interesting ways if the task is too complex. β οΈ
You need to address all these issues:
- π Provide good system prompts and guidance (e.g., always write tests for every change, always make the tests pass before you commit).
- π Provide links to documentation and other resources that may be useful.
- πͺ Break down large tasks into smaller ones and assign them to the AI software engineer.
OK. Let’s meet the team. π
π¨βπ» Who is Devin? π¨βπ»
Devin is the flagship product of Cognition. It is one of the earliest and most polished AI software engineers. It has its own web IDE, which includes a coding editor, a web browser and a terminal. You can interact with Devin as it works and provide knowledge articles that Devin will take into account π. Devin can integrate with Slack, Linear, Jira and of course GitHub. You can also comment on the PR and Devin will see your comments and take action. π οΈ The main thing I dislike about Devin is that it keeps going to sleep π΄ and it takes a while to wake up. Understandable as it runs in the cloud, but it definitely adds friction.
Here is what Devin looks like:
π§ Who is Claude Code? π§
If Devin is the corporate AI engineer,
then Claude Code is the rebel. π΄ββ οΈ
It is developed by Anthropic. It is a pure CLI tool that runs in your terminal. It changes the code in the current
branch directly on your machine. This is a double-edged sword π‘οΈ. On one hand, it is convenient because you can look at
the changes directly in your IDE, without constantly pulling from a branch like you have to with Devin (unless you work
in Devin’s IDE). On the other hand, you can’t work in the same repo without clashing with Claude Code. Instead of the
knowledge articles system of Devin, Claude Code uses a simple CLAUDE.md
file you can place in any directory to guide
it. π
Here is what Claude Code looks like:
π Who is/are OpenHands? π
OpenHands (previously OpenDevin) is an open source AI software engineer.
It is inspired by Devin and similarly has its own editor, terminal, and browser π. Its interface is simpler and doesn’t
have the same level of polish as Devin. But it is open source and you can run it on your own machine in a Docker
container π³. OpenHands can do pretty much everything Devin and Claude can do (well, it can’t watch for GitHub issues or
Slack).Still, itβs a pretty impressive open-source alternative.
Here is what OpenHands looks like:
π‘ Take-home points π‘
- π We are doomed!
- π€ AI software engineers are already useful β and will get better very quickly.
- π‘οΈ Until then, invest in good prompts, clear instructions, and make sure to review the code carefully!
π©πͺ Bleibt seltsam, meine Freunde