From Solo AI Engineer to Autonomous Dev Team

Overview

In a previous post, the AI Digital Engineer pattern was introduced, featuring a single Claude Code agent guided by Skills and enforced by Hooks to execute a complete engineering workflow. This approach demonstrated effectiveness in delivering production-ready code with guaranteed quality gates.

However, a fundamental limitation emerged: one agent performing all tasks.

This limitation highlighted the need for a multi-agent system, where a team of AI agents, each with a distinct role, could collaborate autonomously to transform GitHub issues into merged pull requests without human intervention.

The Limitation of a Solo AI Engineer

The single-agent approach detailed in the previous post is suitable for interactive development workflows. A human creates an issue, initiates Claude Code execution, monitors progress, and triggers reviews. While the Skills + Hooks architecture ensures quality, human oversight is required for orchestration.

This approach presents several bottlenecks:

  • Manual dispatch โ€” Each task requires manual assignment to the AI agent.
  • Self-review โ€” The same agent responsible for code generation also performs the review, potentially reducing objectivity.
  • Sequential processing โ€” Tasks are processed one at a time, dependent on human checkpoints.
  • No crash recovery โ€” Progress is lost if a session terminates unexpectedly.

The objective is to establish a system where agents emulate a conventional engineering team structure: a tech lead assigns work, a developer implements features, and a reviewer independently verifies quality โ€” all operating autonomously.

Introducing the Autonomous Dev Team

The Autonomous Dev Team establishes a fully automated development pipeline that automates the transformation of GitHub issues into merged pull requests, eliminating human intervention. It is powered by OpenClaw as the orchestration layer and supports multiple AI coding CLIs: Claude Code, Codex CLI, and Kiro CLI.

The architecture emulates a standard engineering team with three distinct roles:

 1โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 2โ”‚                     Autonomous Dev Team                              โ”‚
 3โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 4โ”‚  Dispatcher          โ”‚  Dev Agent        โ”‚  Review Agent             โ”‚
 5โ”‚  (OpenClaw)          โ”‚  (Claude/Codex/   โ”‚  (Claude/Codex/           โ”‚
 6โ”‚                      โ”‚   Kiro)           โ”‚   Kiro)                   โ”‚
 7โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 8โ”‚ โ€ข Scans issues       โ”‚ โ€ข Reads issue     โ”‚ โ€ข Finds linked PR         โ”‚
 9โ”‚   every 5 minutes    โ”‚   requirements    โ”‚ โ€ข Checks merge conflicts  โ”‚
10โ”‚ โ€ข Dispatches agents  โ”‚ โ€ข Creates worktreeโ”‚ โ€ข Runs review checklist   โ”‚
11โ”‚ โ€ข Manages labels     โ”‚ โ€ข Implements TDD  โ”‚ โ€ข Verifies E2E tests      โ”‚
12โ”‚ โ€ข Handles crashes    โ”‚ โ€ข Creates PR      โ”‚ โ€ข Approve or reject       โ”‚
13โ”‚ โ€ข Enforces           โ”‚ โ€ข Marks checkboxesโ”‚ โ€ข Auto-merge on pass      โ”‚
14โ”‚   concurrency        โ”‚ โ€ข Resumes sessionsโ”‚ โ€ข Posts structured        โ”‚
15โ”‚   limits             โ”‚                   โ”‚   findings                โ”‚
16โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

How It Works: The Label-Based State Machine

The entire workflow is driven by GitHub issue labels. Each label represents a state, and agents transition between states based on their outcomes:

flowchart LR
    A[Issue Created] -->|autonomous label| B[Dispatcher Picks Up]
    B -->|adds in-progress| C[Dev Agent Working]
    C -->|success| D[pending-review]
    C -->|crash| D
    D -->|dispatcher cycle| E[Review Agent Working]
    E -->|PASS| F[approved + merged]
    E -->|FAIL| G[pending-dev]
    G -->|dispatcher cycle| C

There is no central database or message queue. GitHub labels serve as the single source of truth. The dispatcher polls every 5 minutes, reads the labels, and executes predefined actions:

Current LabelsAction
autonomous (no state label)Dispatch Dev Agent (new session)
autonomous + pending-reviewDispatch Review Agent
autonomous + pending-devDispatch Dev Agent (resume session)
autonomous + approvedDone โ€” PR merged

The Three Agents in Detail

Agent 1: The Dispatcher (OpenClaw)

The dispatcher functions as the team's tech lead. It operates on a cron schedule (every 5 minutes), scans for actionable issues, and spawns the appropriate agent:

 1# Dispatcher workflow (simplified)
 2# 1. Check concurrency โ€” don't exceed MAX_CONCURRENT (default: 5)
 3# 2. Find issues needing work
 4# 3. Spawn agent via nohup background process
 5# 4. Detect stale processes and recover
 6
 7# Example: dispatching a dev agent
 8dispatch-local.sh dev-new $ISSUE_NUM
 9
10# Example: dispatching a review agent
11dispatch-local.sh review $ISSUE_NUM
12
13# Example: resuming after review rejection
14dispatch-local.sh dev-resume $ISSUE_NUM $SESSION_ID

Key capabilities include:

  • Concurrency control โ€” Tracks active processes via PID files, adhering to the MAX_CONCURRENT limit.
  • Crash recovery โ€” Detects terminated processes and transitions stale in-progress issues to pending-review.
  • Session tracking โ€” Extracts session IDs from comments for resumable development.
  • Self-correction โ€” Manages edge cases such as unintended re-dispatching of already-approved issues.

Agent 2: The Dev Agent (Developer)

The Dev Agent serves as the implementer. Upon dispatch, it performs the following:

  1. Reads the issue โ€” Parses requirements and acceptance criteria.
  2. Creates an isolated worktree โ€” Utilizes git worktree add to prevent cross-contamination between parallel tasks.
  3. Follows TDD โ€” Writes tests prior to implementation, guided by the autonomous-dev skill.
  4. Marks progress โ€” Checks off requirement checkboxes in the issue as each is completed.
  5. Creates a PR โ€” Includes Closes #<issue-number> to link the issue.
  6. Reports results โ€” Posts a structured session report with a session ID for resumability.

The Dev Agent supports two modes:

  • New โ€” Executes fresh implementation from scratch.
  • Resume โ€” Continues a previous session, processing review feedback and resolving issues without re-implementing completed work.

Agent 3: The Review Agent (Reviewer)

The Review Agent acts as the quality gate. It operates independently from the Dev Agent (utilizing a different model and session) and adheres to a strict checklist:

  1. Find the PR โ€” Locates the pull request linked to the issue via Closes #N references.
  2. Check merge conflicts โ€” If conflicts are present, it performs a rebase onto the main branch and force-pushes.
  3. Run review checklist โ€” Executes 10 items covering process compliance, code quality, testing, and infrastructure safety.
  4. Trigger external reviews โ€” Posts /q review to invoke Amazon Q Developer for static analysis.
  5. Run E2E tests โ€” Uses Chrome DevTools MCP to verify on the preview deployment.
  6. Execute the Findingsโ†’Decision Gate โ€” This critical step prevents inconsistent verdicts.

The Findingsโ†’Decision Gate is a mandatory self-check: the review agent enumerates all findings, classifies each as BLOCKING or NON-BLOCKING, and only approves if there are zero blocking findings. This mechanism prevents the common failure mode where an agent provides a positive assessment despite listing problems.

Real-World Workflow: A Complete Example

This section details a real feature implementation, illustrating the collaborative workflow of the three agents. This example originates from a production Next.js application where the autonomous dev team manages feature implementation and bug fixes.

Phase 1: Issue Created and Dispatched

A feature request is created with the autonomous label โ€” to add swipe gestures for day navigation on mobile. The dispatcher picks it up within 5 minutes:

The dispatcher (my-claw) picks up the issue and spawns the dev agent (kane-coding-agent). The dev agent reports its session for resumability. The review agent (kane-test-agent) then takes over.

The dispatcher (my-claw) picks up the issue and spawns the dev agent (kane-coding-agent). The dev agent reports its session for resumability. The review agent (kane-test-agent) then takes over.

The dispatcher adds the in-progress label, spawns the dev agent, and initiates monitoring. Upon dev agent completion (exit code 0), the dispatcher transitions the issue to pending-review and spawns the review agent.

Phase 2: Review Agent Rejects with Structured Findings

The review agent executes its full checklist and identifies 5 blocking issues:

The review agent posts structured findings: missing design document, missing test cases, no unit tests, pending CI, and unchecked PR checklist. Each finding includes a specific action item.

The review agent posts structured findings: missing design document, missing test cases, no unit tests, pending CI, and unchecked PR checklist. Each finding includes a specific action item.

This scenario demonstrates the value of the multi-agent architecture. A single agent performing self-review may overlook deficiencies in documentation or process adherence. An independent review agent identifies process violations that an implementing agent might bypass due to task-specific incentives.

Phase 3: Dev Agent Self-Corrects

The review rejection triggers the dispatcher to transition the issue back to pending-dev. The dev agent resumes its previous session and addresses all 5 findings:

The dev agent addresses all review findings: creates design document, adds test case document with 10 scenarios, writes 15 unit tests, extracts a pure function for testability, and rebases on latest main.

The dev agent addresses all review findings: creates design document, adds test case document with 10 scenarios, writes 15 unit tests, extracts a pure function for testability, and rebases on latest main.

The dev agent independently identified and addressed required fixes. It processed the review comment, comprehended the requirements, and systematically addressed each point, notably by extracting detectSwipe as a pure function for enhanced testability.

Phase 4: Review Passes

The review agent re-executes its checklist. This time, all items pass:

Review PASSED with 0 blocking findings. The review agent verified design docs, 15 unit tests, all CI checks passing, E2E tests on preview deployment, and all 7 acceptance criteria.

Review PASSED with 0 blocking findings. The review agent verified design docs, 15 unit tests, all CI checks passing, E2E tests on preview deployment, and all 7 acceptance criteria.

The review agent approves the PR. Since this issue includes the no-auto-close label, it notifies the maintainer for manual merging instead of automatic merging.

A Bug Fix in Under an Hour

The team also manages bug resolutions with the same pipeline. Presented here is a CJK character encoding bug โ€” from issue creation to merged PR in under one hour:

A bug report: CJK characters in plan slugs cause the detail page to fail loading. The issue includes screenshots showing the broken behavior.

A bug report: CJK characters in plan slugs cause the detail page to fail loading. The issue includes screenshots showing the broken behavior.

The dev agent's root cause analysis: slugify() preserved CJK via Unicode regex, causing URL encoding mismatches through CloudFront โ†’ API Gateway โ†’ Lambda. Fixed with NFKD normalization and 22 unit tests.

The dev agent's root cause analysis: slugify() preserved CJK via Unicode regex, causing URL encoding mismatches through CloudFront โ†’ API Gateway โ†’ Lambda. Fixed with NFKD normalization and 22 unit tests.

Review passed with 0 blocking findings. The review agent verified all 6 acceptance criteria, 22 unit tests, E2E tests with ASCII-only URLs, and Amazon Q's positive review. Issue auto-closed.

Review passed with 0 blocking findings. The review agent verified all 6 acceptance criteria, 22 unit tests, E2E tests with ASCII-only URLs, and Amazon Q's positive review. Issue auto-closed.

The dev agent did not merely address the symptom; it conducted a thorough root cause analysis, identified the URL encoding mismatch within the CloudFront โ†’ API Gateway โ†’ Lambda chain, and implemented a comprehensive fix with 22 unit tests covering CJK, diacritics, emoji, and mixed scripts.

Pluggable Agent Architecture

The system supports multiple AI coding CLIs through an abstraction layer:

 1# scripts/lib-agent.sh โ€” Agent CLI abstraction
 2run_agent() {
 3  case "$AGENT_CMD" in
 4    claude)
 5      claude --session-id "$SESSION_ID" \
 6             --model "$MODEL" \
 7             -p "$PROMPT" \
 8             --allowedTools "$TOOLS"
 9      ;;
10    codex)
11      codex -p "$PROMPT" \
12            --model "$MODEL" \
13            --approval-mode full-auto
14      ;;
15    kiro)
16      kiro -p "$PROMPT" \
17           --model "$MODEL" \
18           --non-interactive
19      ;;
20  esac
21}
FeatureClaude CodeCodex CLIKiro CLI
Dev AgentFull supportBasicBasic
Review AgentFull supportBasicBasic
Session ResumeNative (--session-id)Falls back to newFalls back to new
Model SelectionConfigurableConfigurableConfigurable

Claude Code offers the most comprehensive integration, particularly regarding session resumability, which facilitates the reviewโ†’fixโ†’re-review cycle without re-implementing completed work.

Authentication: GitHub Apps for Audit Clarity

Each agent can operate as a distinct GitHub App bot, providing clear audit trails:

1kane-coding-agent[bot]   โ†’ Dev Agent actions (commits, PR creation)
2kane-test-agent[bot]     โ†’ Review Agent actions (reviews, approvals)
3my-claw[bot]             โ†’ Dispatcher actions (label changes, comments)

This makes it straightforward to trace agent activities. The accompanying screenshots illustrate distinct bot avatars and identities, providing the same visibility as a human team.

The system also supports a simpler token-based mode where all agents share one identity, which is simpler for initial setup.

Getting Started

1. Use the Template

The autonomous-dev-team repository is a GitHub template. Utilize the "Use this template" button to create your own copy:

The autonomous-dev-team template repository on GitHub โ€” ready to use with Claude Code, Codex CLI, or Kiro CLI.

The autonomous-dev-team template repository on GitHub โ€” ready to use with Claude Code, Codex CLI, or Kiro CLI.

2. Configure Your Project

1# Copy the config template
2cp scripts/autonomous.conf.example scripts/autonomous.conf
3
4# Edit with your settings
5cat scripts/autonomous.conf
 1# Project identification
 2PROJECT_ID="my-project"
 3REPO="myorg/my-project"
 4PROJECT_DIR="/path/to/my-project"
 5
 6# Agent CLI selection (claude, codex, or kiro)
 7AGENT_CMD="claude"
 8
 9# Authentication mode (token or app)
10GH_AUTH_MODE="token"
11
12# Concurrency
13MAX_CONCURRENT=5

3. Set Up OpenClaw Dispatcher

Install OpenClaw and configure the dispatcher cron:

1# Run dispatcher every 5 minutes
2*/5 * * * * cd /path/to/autonomous-dev-team && openclaw run

4. Create an Issue with the autonomous Label

Create a GitHub issue, add the autonomous label, initiating the automated pipeline. Within 5 minutes, the dispatcher will spawn a dev agent, which will implement the feature, create a PR, and hand it off for review.

From Solo to Team: What Changed

AspectSolo AI EngineerAutonomous Dev Team
OrchestrationHuman initiates each taskDispatcher auto-assigns from issues
ReviewSelf-review (same agent)Independent review agent
ConcurrencyOne task at a timeUp to 5 parallel tasks
Crash RecoveryLost progressAuto-retry with session resume
Audit TrailSingle conversationSeparate bot identities per role
Human InvolvementInitiate, monitor, approveCreate issue, optionally final merge
Agent CLIClaude Code onlyClaude Code, Codex CLI, Kiro CLI

The Skills + Hooks architecture from the previous post continues to power each individual agent. The key innovation lies in the orchestration layer โ€” comprising the dispatcher for work assignment, the label-based state machine for progress tracking, and the separation of implementation from review.

Conclusion

The evolution from a solo AI engineer to an autonomous dev team parallels the growth observed in human engineering organizations. The initial phase involves a single capable developer (the AI Digital Engineer), which then scales to a team with specialized roles and clear handoff protocols.

The Autonomous Dev Team template provides:

  • Zero human intervention โ€” Issues automatically progress to merged PRs.
  • Independent review โ€” Development and review agents operate with separate sessions and models.
  • Crash resilience โ€” A label-based state machine with automatic retry functionality.
  • Multi-CLI support โ€” Compatibility with Claude Code, Codex CLI, and Kiro CLI via a pluggable abstraction.
  • Clear audit trails โ€” GitHub App bots provide distinct per-agent identities.

The code is open source and available as a GitHub template. Implement it in a small project, observe the agents' collaboration, and subsequently scale its application.

Resources