I have been running multiple coding agents at the same time for a while now.
Sometimes I have Claude Code working on one feature, Codex reviewing a plan, Gemini CLI exploring another direction, and another Claude session fixing a smaller issue in parallel. On paper, this sounds like the future of software engineering. One engineer, multiple agents, multiple streams of execution are happening at the same time.
And honestly, it is exciting.
For the first few minutes, it feels like I suddenly have a small engineering team around me. One agent can build, one agent can review, one agent can explore an alternative, one agent can check if the plan makes sense. Instead of waiting for one task to finish, I can start several pieces of work and let them run.
But after doing this for long enough, I realized the hard part is not starting more agents.
The hard part is keeping up with them.
The old productivity equation is changing
In the past, engineering capacity mostly scaled with people.
If we wanted to do more work, we hired more engineers. That gave us more hands, more thinking capacity, more reviews, more parallel execution, and more ownership. Of course, it also added communication overhead, meetings, planning complexity, alignment cost, and all the usual coordination problems that come with growing a team.
But the basic model was simple: more people meant more capacity.
With coding agents, one engineer can now start multiple streams of work at the same time. The amount of work we can attempt is no longer limited only by the number of engineers in the team. It also depends on how many agents we can safely direct, review, and coordinate. This is a big thing.
It does not mean headcount no longer matters. That is too simplistic. Great engineers still matter a lot because judgment, system understanding, product context, and ownership still sit with humans.
But the scaling model is changing.
In the future, the question may not only be “how many engineers do we have?” It may also be “how many agent workflows can these engineers safely orchestrate?”
The word “safely” matters because generating more code is not the same as creating more good software.
Why I started running everything at once
Part of my motivation is practical.
I already paid for multiple AI subscriptions. Claude, ChatGPT, Google AI, and a few different tools around them. I do not want to use them in a slow sequence: use Claude first, wait until I hit the limit, then switch to Codex, then switch to Gemini when something else runs out. That feels wasteful.
If I have multiple tools available, I want to use them at the same time. Not because every tool is equally good at every task, but because different tools have different strengths, speeds, limits, and behaviors.
Claude Code may be better for one kind of implementation. Codex may be useful for reviewing a plan or taking another pass at the same problem. Gemini CLI may give me a different perspective. Sometimes the value is not that one model is clearly better, but that they do not fail in exactly the same way.
So I started thinking less about “which tool should I use?” and more about “how do I compose these tools together?”
That is where things get interesting.
One agent can implement a feature, another can review the plan before implementation, another can generate tests, another can inspect the diff and look for edge cases, another can explore a simpler version of the same solution.
I am no longer just doing AI-assisted coding. It starts to look more like orchestration.
The first wall I hit was not code quality
I expected the first major bottleneck to be code quality.
That is still a real problem. AI-generated code can be functional but wrong. It can pass the happy path while missing the edge cases, it can create abstractions that look clean in isolation but do not fit the existing system, it can confidently change things it does not fully understand.
But surprisingly, that was not the first wall I hit.
The first wall was tab switching.
Personally, I use tmux heavily. I arrange agents by session, window, and pane. One session for a feature, one session for another experiment, one pane for a running server, one pane for tests, one pane for a coding agent, another pane for a review agent.
This works quite well when the number of active streams is small.
If an agent needs my attention once an hour, no problem. I can switch to it, read the output, make a decision, and move on.
But agents do not work at human speed.
They generate code fast, they ask for confirmation, they hit errors, they finish tasks, they need permission, they ask whether to continue, they suggest a plan, they wait for feedback. When that happens every few minutes, tab switching becomes more than a small annoyance.
I found myself jumping between sessions, trying to remember what each agent was doing, what I asked it to do, whether I had already reviewed its plan, whether the current output was good, and whether I still trusted the direction.
The visible problem was switching tabs, and the real problem was switching context.
More agents create more open loops
Every agent creates an open loop in your head.
One agent is working on the API layer, another is touching the frontend, another is reviewing tests, another is exploring a refactor, another is checking if your architecture plan makes sense. Each one has its own goal, assumptions, progress, risk, and current state.
The agent may remember its own context, but you still need to remember your context.
- Why did I start this task?
- What constraint did I give it?
- Did it follow the existing pattern?
- Did it make a decision I did not approve?
- Is this output still aligned with the original intent?
- Should I stop it now before it goes too far?
This is the part I think many people underestimate.
Running multiple agents does not mean there is more of you. Your attention does not automatically scale just because the execution layer scales. You may have five agents generating output, but there is still one human brain trying to understand, judge, and integrate that output.
That is the next bottleneck. The bottleneck is whether the engineer can safely navigate and orchestrate multiple agents without losing control of the work.
The engineer becomes the orchestrator
I used to think AI productivity was mostly about writing better prompts.
Prompting matters, but it is only the first layer. Once you move from one AI assistant to multiple agents, the skill changes. You are no longer only asking for code. You are managing work.
- You need to decide which task should be delegated.
- You need to split work into clear boundaries.
- You need to decide which agent should review which output.
- You need to avoid two agents making conflicting changes.
- You need to know when to interrupt an agent before it digs a deeper hole.
- You need to create verification loops so you do not become the only test system.
This feels much closer to engineering leadership.
A good tech lead does not randomly assign work to five engineers and hope everything magically merges at the end. They clarify the goal, define boundaries, create checkpoints, review important decisions, and make sure the pieces fit together.
Multi-agent engineering needs the same discipline. The difference is that agents move faster, produce more output, and can create more mess in less time.
Experiment 1: hierarchical orchestration
One experiment I have been working on is a hierarchical setup.
The idea is simple: instead of me directly managing every agent, one agent acts as the coordinator. It works like a team lead. It understands the goal, breaks down the work, delegates tasks to other agents, collects progress, and summarizes what needs my attention.
In theory, this reduces the interruption cost.
Instead of five agents asking me five different questions, the coordinator can filter the noise. It can decide which updates are worth surfacing, ask one agent to review another agent’s plan, compare outputs, and tell me where the work is stuck.
This is powerful when it works. It changes the interaction model from “I need to monitor every agent” to “I need to monitor the system”.
But it also introduces a new problem: How much do I trust the coordinator?
If the coordinator misunderstands the goal, the whole group may move in the wrong direction. If it summarizes too aggressively, I may miss an important detail. If it delegates poorly, I may only discover the problem after multiple agents have already produced a lot of code.
So the coordinator agent does not remove the need for human judgment. It moves the judgment point.
Instead of reviewing every small step, I now need to review the coordinator’s decisions, assumptions, and summary quality. That is useful, but it is not free.
The lesson so far: hierarchy helps reduce direct noise, but it needs strong checkpoints.
Experiment 2: mission control for agents
The second experiment is a mission-control style interface.
The idea is to have one place where I can see all active agents, understand what each one is doing, switch to any of them, and talk to them directly without jumping across many terminal tabs.
This came from my own frustration with tmux.
Tmux is great. I still use it, but tmux was not designed specifically for managing AI agents. It helps organize terminal windows, but it does not understand agent state.
- It does not tell me which agent is waiting for input.
- It does not summarize what changed.
- It does not show which agent is blocked.
- It does not help me understand which task is risky.
- It does not tell me which session needs my attention first.
That is what I want from a mission-control interface, not a fancy UI for the sake of it. I want a lower-friction way to manage attention.
I want to glance at one place and know what is happening.
- Which agents are running?
- Which agents are waiting?
- Which agents have produced a diff?
- Which agents are stuck?
- Which agents need review?
- Which agents can continue without me?
This sounds small, but I think it matters a lot.
When the cost of navigation is high, you avoid checking. When you avoid checking, agents drift. When agents drift, review becomes harder. When review becomes harder, you either slow down or you merge things with less confidence.
A good mission-control layer should reduce the cost of staying on top of the work. You can try it out by running npx ai-devkit@latest agent console.
AI DevKit as my experiment foundation
This is also why I have been spending more time experimenting with AI DevKit. Not as “the answer” to everything, but as a foundation for experiments.
Once you start playing with multi-agent workflows, you quickly realize you need reusable building blocks. You need a way to manage agent sessions. You need a way to structure phases. You need a way to keep context organized. You need shared memory. You need a way to make experiments repeatable instead of hacking together a one-off script every weekend.
For example, having a foundation like an agent manager makes it easier to build higher-level workflows on top. I do not want to rebuild the basic plumbing every time I want to test a new orchestration idea.
That is the useful part. The product is not the point here. The research question is the point.
- How should engineers work when they have multiple agents running at the same time?
- How much autonomy should each agent have?
- When should an agent interrupt?
- What should be summarized?
- What should never be summarized?
- Which review tasks can be delegated to another agent?
- Where must the human stay directly involved?
I do not think we have the final answer yet. That is what makes it interesting.
The next tooling layer is not a better code editor
Most AI coding tools today still feel centered around one main interaction: human asks, AI responds.
That works for many workflows, but multi-agent work needs a different shape.
- We need tools that understand agent state, not just terminal output.
- We need attention routing, so the engineer is interrupted only when needed.
- We need review queues, so generated work does not disappear into random sessions.
- We need task boundaries, so agents do not step on each other.
- We need verification loops, so speed does not destroy confidence.
- We need better ways to compare parallel attempts at the same problem.
- We need summaries that preserve decision quality, not just compress text.
This is where I think the next wave of engineering tooling will go.
The future is not only a smarter code editor. It is an operating layer for agentic work. A place where engineers can define intent, spin up agents, observe progress, redirect work, review outputs, and keep ownership without drowning in context switching.
More output is not the same as more progress
There is a trap here.
When you first run multiple agents, it feels productive because many things are happening. It looks like progress. But output is not progress.
Progress means the system is moving closer to the right outcome. That requires judgment.
- If five agents generate five different implementations that do not fit the codebase, you did not become five times more productive. You just created five things to review.
- If one agent builds a feature and another breaks a shared abstraction, you did not save time. You moved the cost to integration.
- If agents produce code faster than your verification loop can handle, you are not accelerating engineering. You are accumulating uncertainty.
This is why I keep coming back to the same idea: AI makes execution cheaper, which pushes the constraint somewhere else.
For software engineering, I think the constraint is moving toward orchestration, verification, and attention.
Where I am now
I am still early in this exploration. I have some working experiments, some feel promising, some feel awkward.
That is fine. This is exactly the phase I enjoy. The interesting part is figuring out how humans and agents should work together when the number of active agents keeps increasing.
I do not think the best engineers in the AI era will simply be the ones who know how to prompt. They will be the ones who know how to decompose problems, assign work, create feedback loops, verify outputs, and keep ownership while multiple agents execute in parallel.
That is a different skill, and like every real engineering skill, it will take practice.
I also do not believe the future is as simple as “AI replaces engineers”. That framing is lazy.
A more realistic future is that one engineer may coordinate many agents. Those agents may write code, review plans, generate tests, inspect logs, compare designs, or explore alternatives. The engineer still owns the problem, judgment, and the final call. The shape of the job changes, less time typing every line by hand, more time deciding what should be done, how work should be split, what good looks like, and whether the result is safe enough to ship.
That future is exciting to me, but it will only work if we design the workflow carefully. Otherwise, we will not remove the bottleneck. We will just move it into the engineer’s head.
More agents, same human brain.
That is the problem I want to keep exploring.
If this perspective resonates with you, subscribe to my blog; it’s free. I share what I learn while building real systems with AI in the loop. You can also follow me on X or Threads for more thoughts and ongoing experiments.
Discover more from Codeaholicguy
Subscribe to get the latest posts sent to your email.