Better Vibe Coding - Part 2, Multi-agents & Complex Tasks

This post is a direct continuation of Better Vibe Coding - Part 1, Foundation & Basics. If you haven’t read that yet, start there for the basics.

2. “Vibe Coding” complex tasks

Phew, now that we got the basics covered, let’s go into more advanced territory.

If you vibe coded in bigger codebases you’ll likely face issues with the agent’s quality going down the drain the more files it reads, and the more complex a task becomes.

When an agent reads in more tokens, eventually it’ll flush out stuff that came in the beginning of the conversation and lose context. We simply can’t afford to read files that we don’t need to read, and features that span multiple files need different handling.

All of these things I mention here is what I use for my projects on a daily basis and consider crucial.

2.1 Split plan phases from implementation phases

Putting it as it is: Purely vibe coding complex features (as in, opening a prompt and telling an agent to “implement this feature”) is just not doable currently due to limitations mentioned previously.

Imagine treating a new-joiner in your team like an AI agent. Would you just tell them to “implement this feature” without any guidance? Yeah very likely not (unless you want to scare your new hire away already).

So how can we work around it? By thinking like a manager: Separating planning from implementation phases.

Part of implementing any feature in a professional context always involves a planning phase before we can create use stories or a design document, so why should working with agents be any different?

Planning phase

Understand and clarify the ask: What do we need to do, why
Define the acceptance criteria
Read a lot of code
Research
Come up with a plan
Output a design document

Implementation phase

Often a new session to flush the context
Reads the plan and implements it

For the planning phase, I use a model with a bigger context since it has to gobble up a lot of code (gemini-2.5-pro or GPT4.1). This is where thinking-models really shine and should be used: Devise a good plan, doesn’t matter how long it takes.

For the implementation phase, thinking is usually not required because the heavy lifting has already been done in the planning phase.

approach diagram

Roo or Cline have PLAN phases built-in, but we can easily emulate that in other tools as well with a prompt like the following (I have this as a custom Cursor mode):

You are currently in PLAN mode.

- PLAN mode is for creating detailed plans and strategies, not for implementing code changes
- Your goal is to gather information, conduct research, and develop a comprehensive plan before any implementation
- Focus on understanding requirements and architecting solutions rather than writing actual code
- Use tools like `read_file` and `search_files` to gather context about the existing codebase
- Ask clarifying questions to ensure you fully understand what's needed
- Create detailed, well-researched plans with clear steps and reasoning
- Use Mermaid diagrams when helpful to visualize architecture, workflows, or processes
- You may include small code snippets to illustrate concepts, but avoid suggesting full implementations
- Engage in back-and-forth discussion to refine the plan until it meets requirements
- When a satisfactory plan is established, suggest switching to ACT mode for implementation
- Remember: in PLAN mode, your focus is exclusively on planning, not doing

PREFIX ANY MESSAGE YOU WRITE WITH "PLAN MODE:" 

THIS IS VERY VERY IMPORTANT!!!

If you get an error of edit_file not being available, then THIS IS A REMINDER TO YOU THAT WE ARE IN PLAN MODE!!!! REMEMBER YOUR INSTRUCTIONS

Once the plan is generated, I pass it to a new agent to do the actual implementation.

2.2 Resumption and hand-off with design docs and logs

Smaller edits can be made one-shot by passing the plan to an implementor model and letting it do its thing, but bigger plans may still exceed the context window and drop in quality.

For this reason, I always use 2 document formats: A clearly defined design doc and a log.

The design doc is the output of the planning phase. Its purpose is similar to the “onboarding document” we talked about previously, but with the objective to onboard someone onto a specific feature, so the same rules apply: Include everything that someone with no knowledge should know about this feature:

What needs to be done and why
Link to documents, tickets
Technical design
Clearly defined list with proposed high-level changes
- What files need to be looked at, where to find them
- What changes do we need to do in those files

The purpose of this document is to condense the huge context window research that the planning phase did into something that is actionable, and more important: reviewable (by us). This is the final big step before implementation, so we need to make sure that the idea the LLM has aligns with what we want to do. I iterate on these design docs a couple times until it’s in a shape that I’m happy with.

Always define a guide or template for these documents. Don’t let the AI do what it wants, we want the documents to be consistent and contain the information we already know we need for effective implementation.

A good format (from my experience) is following similar principles to the rules and manual document. This document is not intended for humans, it’s for LLMs, so it needs to be able to quickly bring the agent up to speed and point it towards the files and important bits that it needs to know, while providing context to do additional research if needed.

# Agent UI Simplification Design Doc

**Date:** 2025-04-24

**Author:** GitHub Copilot

**Linear Issue:** MIC-28

## 1. Overview

Simplify the Agent interface by merging the configuration options (currently under a separate "Settings" tab) into the main "Execute" tab. Rename the combined tab to "Agent". The "Execution History" tab will remain separate.

## 2. Motivation

The current tab structure (Execute, Settings, Execution History) with nested tabs (Configuration, Functions) within Settings is convoluted. Users need to switch tabs frequently to configure and run an agent. Combining configuration and execution into a single view streamlines the workflow.

## 3. Proposed Changes

-   **Modify `app/agents/[id]/page.tsx`**:
    -   Remove the "Settings" `TabsTrigger` and `TabsContent`.
    -   Rename the "Execute" `TabsTrigger` to "Agent".
-   **Modify `components/agents/agent-execution-panel.tsx`**:
    -   Integrate the UI elements and logic currently present in `components/agents/agent-config-panel.tsx` (Agent Name, Description, Model, System Prompt, Function Linking).
    -   Arrange the configuration elements logically within the panel, likely above or alongside the execution controls.
-   **Deprecate/Remove `components/agents/agent-config-panel.tsx`**: This component will no longer be needed as its functionality is merged into the execution panel.

## 4. Technical Design

1.  **Update Tabs in `AgentPage`**: Modify the `Tabs`, `TabsList`, and `TabsContent` structure in `app/agents/[id]/page.tsx` to reflect the two-tab layout ("Agent", "Execution History").
2.  **Merge Components**:
    -   Identify the specific UI sections and state management logic within `AgentConfigPanel` (e.g., form handling for name/description, model selection, system prompt editor, function linking UI).
    -   Transfer these UI sections and their associated logic (props, state, handlers) into `AgentExecutionPanel`. Ensure props (`agent`, `functions`) are correctly utilized.
    -   Adjust the layout within `AgentExecutionPanel` to accommodate the new configuration elements. A vertical stack (Config section above Execution section) seems simplest initially.
3.  **Refactor/Remove `AgentConfigPanel`**: Once all functionality is migrated, `AgentConfigPanel` can be removed or refactored if any parts are still reusable independently (unlikely).
4.  **Testing**: (Future step) Add/update tests for `AgentExecutionPanel` to cover the integrated configuration options.

The counterpart of the design doc is the work log - a document the agent needs to create after it is done. This document references the design doc and explains what the agent actually did.

The purpose of this document is to glance at what has been done, but also to potentially feed into a new agent to get up to speed on this feature, to do additional changes.

# Handoff: Agent UI Gamification - 2025-04-24

**Author:** GitHub Copilot

## TL;DR

We're redesigning the Agent configuration screen (`app/agents/[id]/page.tsx`) to be more visual and intuitive, replacing the old text-heavy forms with a "gamified" interface centered around a robot avatar.

**Design Doc:** `docs/designdocs/agent-gamified-ui.md`

## What We Did So Far

1.  **Created `AgentVisualConfigurator` Component:** Built a new component (`components/agents/agent-visual-configurator.tsx`) that displays:
    - A robot avatar (`BotIcon`).
    - The agent's objective/system prompt (currently display-only).
    - Visual "Skill Slots" for linked functions.
2.  **Integrated into Agent Page:** Replaced the old configuration cards in `components/agents/agent-execution-panel.tsx` with the new `AgentVisualConfigurator`.
3.  **Skill Slot Functionality:**
    - **Display:** Filled slots show the function name (with `SwordIcon` 🗡️) and workspace.
    - **Add:** Clicking an empty slot (`+`) opens the `FunctionSearch` dialog to find and link a new function.
    - **Edit:** Clicking a _filled_ slot opens the `FunctionLinkForm` modal, allowing the user to edit the description (when the agent should use this function).
    - **Remove:** Clicking the trash icon (`Trash2Icon`) on a filled slot removes the function link.
4.  **Data Fetching:** Updated React Query mutation hooks (`lib/hooks/use-agent-functions.ts`) to `await` query invalidation on success, ensuring the UI reflects changes immediately after adding/editing/removing functions without needing a page refresh.
5.  **Styling:** Applied basic Tailwind CSS and shadcn UI styling to the new component and slots.

## Current State

- The visual layout is in place on the agent page.
- Users can view the agent's objective.
- Users can add, edit the description of, and remove function links via the skill slots and associated modals (`FunctionSearch`, `FunctionLinkForm`).
- The underlying data updates correctly and the UI refreshes thanks to query invalidation.

## Next Steps / What's Left

1.  **Save Objective:** Hook up the objective/system prompt `textarea` in `AgentVisualConfigurator` to actually save changes (likely using `useUpdateAgent`).
2.  **Run Agent Integration:** Connect the "Try Your Agent" input and button (`AgentExecutionPanel`) to work with this new view (it might already work, but needs verification).
3.  **Execution History:** Implement the display area for execution logs below the configurator.
4.  **Testing:** Write tests for `AgentVisualConfigurator` and potentially update tests for `AgentExecutionPanel`.
5.  **Refinement (Optional):**
    - Add subtle animations (e.g., robot idle, feedback on add/remove).
    - Further styling improvements.
6.  **Final Build:** Run `bun run build` to ensure everything builds correctly.
7.  **Documentation:** Update any relevant user-facing docs if needed.

This should get the next person up to speed! Lmk if you need anything else. 🔥

2.3 Multi-agent workflow with agent-to-agent delegation

Now we’re getting into the fun bits: Multi-agent orchestration!

For more complex tasks, we can (or should) use a multi-agent workflow. This involves having a separate commander model that orchestrates tasks to implementation models.

The commander model creates a task list in a knowledge base such as Linear or the design doc, creates the stories, then delegates to the sub-agents for implementation.

multiagent

Advantages of this are:

The commander model splits the work like a PM, defines acceptance criteria for each task, reviews code, updates tasks.
The implementation models are VERY accurate and specific since they only do one small task before returning.
No context pollution - the sub-agents are always fresh in context.
Tasks can be very long and will still be accurate because of this.
Tasks can be resumed and stopped at any time because we keep track through the task list or sub-issues.

We can use MCP with tools like Jira or Linear for orchestration, so the commander creates the subtasks and tells the agents to pull them from the issue tracker with MCP.

Here’s an example how it works:

Commander model processes a design doc or epic issue
Commander model does basic research and splits the design doc or epic into subtasks (if not done yet), then optionally creates those in Linear, GitHub or somewhere else
Commander model spawns sub-agents for each task in sequence
Sub-agent implements spins up with new context, clear instructions what to do and how to complete the task
Commander model reviews the finished task, marks the issue as completed, and moves on to task #2

The issue tickets or markdown document serves as a boundary between commander and sub-agent to pass information across. The sub-agent knows it’s instructions, but has the ability to read in more information from the parent if required or if something is unclear.

This workflow can be achieved in any agent by having 2 different prompts and feeding them in, but Roo stands out to me for having all of this built-in with it’s multi-mode setup.

I have built plugins that can do this for:

Vscode: https://github.com/dvcrn/copilot-task-delegate
Delegating with MCP to q/claude: https://github.com/dvcrn/mcp-server-subagent

For completeness sake, here is one prompt I frequently use in Roo which adds a todo checklist to the markdown design doc. I then have variations of this prompt that use Linear or GitHub issues as backend.

Your role is to coordinate complex workflows by delegating tasks to specialized modes. As an orchestrator, you should:

Break this down into subtasks that can be implemented as individual logical chunks. Don't make them too small, but also not too big. Think of them as JIRA subtasks. 

If a design doc is available, I want you to add all these subtasks to the design doc with a checklist on the current implementation progress.
 

For each subtask, use the `new_task` tool to delegate. Choose the most appropriate mode for the subtask's specific goal and provide comprehensive instructions in the `message` parameter. These instructions must include:  
* All necessary context from the parent task or previous subtasks required to complete the work.  
* A clearly defined scope, specifying exactly what the subtask should accomplish.  
* An explicit statement that the subtask should *only* perform the work outlined in these instructions and not deviate.  
* If available, a reference to the design doc, files to change, scope
* An instruction for the subtask to signal completion by using the `attempt_completion` tool, providing a concise yet thorough summary of the outcome in the `result` parameter, keeping in mind that this summary will be the source of truth used to keep track of what was completed on this project.  
* A statement that these specific instructions supersede any conflicting general instructions the subtask's mode might have.
* Instructions to commit the changed files (only the changed files, NOT `git add .`) after completion, after all `build` and `format` instructions (such as `make build` or `make format` if available)

For the git commit message, Start the message with what the commit does, a verb, first letter capitalized. Eg Update xxx to yyy, Change bbb to better ccc, and so on. When reading the log, we should be able to read it as 'When this commit is applied, it will <followed by the commit message>'

Once the subtasks returns, I want you to check the checkbox in the design doc to mark the task as completed.

A subtask is only considered completed when a commit has been made. Make sure you define the acceptance criteria to the subtask.

When the task returns, review the code for completeness and see if you spot any obvious issues.

Another benefit of this is that now, with asynchronous agents on GitHub or Google Jules becoming better, that we can delegate some of our subtasks directly to those for handling.

Summarizing what we learned so far

Agents are extremely powerful if guided correctly, but we’re not at a point yet where they can be left unsupervised.

Always define clear boundaries, be that in simple tasks, behavior or complex projects.

The more complex a task, the more you should probably split it and break it up into sub-stories that can be worked on independently to not run into context pollution issues: Keep it short and to the point for the best results.

Create new sessions as often as you can, disable MCP you don’t need and use built-in /compact commands to further reduce amount of tokens in the context window at any time.

I’m hoping to expand on this series further in the future when agents evolved a bit more. I’m especially excited about Jules and GitHub Copilot Agents and started using them extensively for my own projects.

2. “Vibe Coding” complex tasks#

2.1 Split plan phases from implementation phases#

2.2 Resumption and hand-off with design docs and logs#

2.3 Multi-agent workflow with agent-to-agent delegation#

Summarizing what we learned so far#

2. “Vibe Coding” complex tasks

2.1 Split plan phases from implementation phases

2.2 Resumption and hand-off with design docs and logs

2.3 Multi-agent workflow with agent-to-agent delegation

Summarizing what we learned so far