AI Digital Engineer: End-to-End Delivery with Claude Code

Overview

What if an AI could operate like a senior software engineer - not just writing code, but following the complete engineering process from design through deployment? This post introduces the AI Digital Engineer pattern: a system that transforms Claude Code from an interactive assistant into an autonomous engineer capable of delivering production-ready software.

The Problem with Traditional AI Coding Assistants

Most AI coding tools operate in a simple request-response pattern: you ask for code, they generate it. This approach has fundamental limitations:

  • No process discipline: The AI writes code without tests, reviews, or verification
  • Fragile workflows: Complex multi-step tasks get lost in context windows
  • Unreliable execution: LLM outputs are probabilistic, not guaranteed
  • Expensive scaling: Every verification step requires another LLM call
  • No audit trail: How do you prove the AI followed your engineering standards?

What we need is a system that combines LLM intelligence for orchestration with deterministic guarantees for execution.

Introducing the AI Digital Engineer

An AI Digital Engineer is an autonomous system that operates like a human software engineer - following the complete engineering process:

Human EngineerAI Digital Engineer
Reviews design requirementsReads design canvas via Pencil MCP
Writes tests before code (TDD)Enforced by PreToolUse hooks
Implements featuresClaude Code implementation
Runs code reviewSpawns PR Review agents
Responds to review feedbackAddresses Amazon Q/Codex findings
Verifies CI passesStop hook blocks until green
Performs E2E testingChrome DevTools MCP integration
Cannot merge without approvalsHook system enforces all gates

The key difference from traditional AI assistants: the AI Digital Engineer cannot skip steps. Quality gates are enforced by deterministic systems, not LLM memory.

The Hybrid Architecture: Intelligence + Reliability

The AI Digital Engineer architecture separates concerns between what needs intelligence (orchestration) and what needs reliability (execution):

 1โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 2โ”‚                    AI Digital Engineer Architecture                 โ”‚
 3โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 4โ”‚  Intelligent Orchestration     โ”‚     Deterministic Execution       โ”‚
 5โ”‚     (Claude Code Skills)       โ”‚  (Hooks + GitHub Actions + Agents)โ”‚
 6โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 7โ”‚ โœ“ Complex workflow navigation  โ”‚ โœ“ 100% execution guarantee        โ”‚
 8โ”‚ โœ“ Exception handling & recoveryโ”‚ โœ“ Zero LLM call cost              โ”‚
 9โ”‚ โœ“ Start/resume from any step   โ”‚ โœ“ No context window limits        โ”‚
10โ”‚ โœ“ Dynamic branching decisions  โ”‚ โœ“ Millisecond response time       โ”‚
11โ”‚ โœ“ Natural language understandingโ”‚ โœ“ Auditable execution logs       โ”‚
12โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
13โ”‚ Best for: Reasoning & judgment โ”‚ Best for: Quality gates & triggersโ”‚
14โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why This Separation Matters

Traditional agent architectures rely on LLMs for everything - including remembering to run tests, checking CI status, and enforcing review requirements. This approach fails because:

Traditional AI AgentAI Digital Engineer
Every step calls LLMLLM only for orchestration decisions
Relies on LLM to remember stepsHooks enforce execution automatically
Context overflow causes skipped stepsPersistent state enables resume
Expensive and unpredictablePredictable cost model
Difficult to auditComplete execution logs

The AI Digital Engineer uses Skills for the brain (what to do, how to handle exceptions) and Hooks for the muscles (guaranteed execution of quality gates).

The Three Pillars

Pillar 1: Claude Code Skills - The Intelligent Orchestrator

Skills are markdown files that guide Claude through complex workflows. The github-workflow skill defines a 12-step development process:

 1---
 2description: GitHub development workflow for end-to-end delivery
 3---
 4
 5You are following a structured development workflow. Current phase: $PHASE
 6
 7## Workflow Steps
 81. Design Canvas - Create UI/architecture mockups (Pencil MCP)
 92. Branch Creation - Use feat/ or fix/ prefix
103. Test Plan - Document test cases before implementation
114. Implementation - Write code following TDD
125. Unit Tests - Verify all tests pass
136. Code Simplification - Run simplifier agent
147. PR Creation - Commit with standardized template
158. PR Review - Run review toolkit agents
169. CI Verification - Wait for GitHub Actions
1710. Bot Review Handling - Address Amazon Q/Codex findings
1811. E2E Testing - Verify on preview environment
1912. Completion - All gates passed, ready for merge
20
21## Exception Handling
22- If CI fails: Analyze logs, fix issues, re-push
23- If bot review finds issues: Address each comment thread
24- If E2E fails: Debug, fix, mark e2e-tests complete
25
26## Resume Capability
27Current state is tracked in .claude/state/
28You can resume from any step based on completed states.

Key capability: Skills enable Claude to handle exceptions intelligently. When CI fails, the skill guides Claude to analyze logs and fix issues - something deterministic scripts cannot do.

Pillar 2: Claude Code Hooks - The Enforcement Layer

Hooks are shell scripts that execute at specific points in Claude's workflow. They guarantee that quality gates are enforced:

 1{
 2  "hooks": {
 3    "PreToolUse": [
 4      {
 5        "matcher": "Bash",
 6        "hooks": [{
 7          "type": "command",
 8          "command": ".claude/hooks/check-design-canvas.sh"
 9        }]
10      },
11      {
12        "matcher": "Write|Edit",
13        "hooks": [{
14          "type": "command",
15          "command": ".claude/hooks/check-test-plan.sh"
16        }]
17      }
18    ],
19    "Stop": [{
20      "hooks": [{
21        "type": "command",
22        "command": ".claude/hooks/verify-completion.sh"
23      }]
24    }]
25  }
26}

How hooks enforce the workflow:

HookTriggerEnforcement
check-design-canvas.shBefore git commitBlocks commit without design doc
check-test-plan.shBefore file write/editBlocks code changes without test plan
check-unit-tests.shBefore git commitBlocks commit with failing tests
check-code-simplifier.shBefore git commitBlocks commit without simplification review
check-pr-review.shBefore git pushBlocks push without PR review
verify-completion.shOn task stopBlocks completion without CI + E2E + resolved comments

Critical insight: These hooks execute in milliseconds with zero LLM cost. They don't ask Claude to remember to check - they physically prevent violations.

Pillar 3: GitHub Actions - External Verification

GitHub Actions provide verification that happens outside Claude's context:

 1name: CI
 2on: [push, pull_request]
 3
 4jobs:
 5  build-and-test:
 6    runs-on: ubuntu-latest
 7    steps:
 8      - uses: actions/checkout@v4
 9      - name: Install dependencies
10        run: npm ci
11      - name: Run tests
12        run: npm test
13      - name: Build
14        run: npm run build
15
16  security-review:
17    runs-on: ubuntu-latest
18    steps:
19      - name: Amazon Q Security Review
20        uses: aws/amazon-q-developer-action@v1
21      - name: CodeQL Analysis
22        uses: github/codeql-action/analyze@v3

The verify-completion.sh hook queries GitHub's API to ensure:

  • CI workflow has passed
  • All review comments are resolved
  • E2E tests are marked complete
 1# From verify-completion.sh
 2CI_STATUS=$(gh run list --branch "$BRANCH" --limit 1 --json conclusion -q '.[0].conclusion')
 3if [ "$CI_STATUS" != "success" ]; then
 4    echo "โŒ Cannot complete: CI has not passed"
 5    exit 1
 6fi
 7
 8# Check unresolved review threads
 9UNRESOLVED=$(gh api graphql -f query='...' --jq '.data.repository.pullRequest.reviewThreads.nodes | map(select(.isResolved == false)) | length')
10if [ "$UNRESOLVED" -gt 0 ]; then
11    echo "โŒ Cannot complete: $UNRESOLVED unresolved review comments"
12    exit 1
13fi

Real-World Example: Multi-tenant User Configuration System

To demonstrate the AI Digital Engineer in action, let's walk through a real complex feature implementation: a multi-tenant user configuration system for an OpenHands deployment platform.

The Feature Requirements

The task was to build a complete user configuration management system allowing each tenant to:

  • Configure custom MCP servers (stdio and HTTP types)
  • Manage third-party integrations (GitHub, Slack) with auto-MCP injection
  • Store encrypted secrets using KMS envelope encryption
  • Merge user configs with global platform configuration

Technical scope:

  • 6,300+ lines of code across 36 files
  • AWS Lambda + API Gateway + KMS + S3 architecture
  • TypeScript CDK infrastructure + Python Lambda handlers
  • Comprehensive unit tests and E2E test cases

How the AI Digital Engineer Delivered It

Phase 1: Design & Test Plan

The workflow began with design documentation and test case definition. The check-design-canvas.sh hook blocked any implementation until architecture was documented. The check-test-plan.sh hook ensured test cases were written before code.

Phase 2: Initial Implementation

Claude implemented the full feature with:

  • UserConfigStack: Lambda + HTTP API Gateway for /api/v1/user-config/* endpoints
  • UserConfigLoader: S3-based config loader integrated with Cognito authentication
  • KMS envelope encryption for user secrets
  • Python Lambda with uv lock file for reproducible dependencies

Phase 3: CI Failures & Recovery

This is where the intelligent orchestration proved essential. The CI pipeline failed with CDK token parsing errors:

1Error: The URL constructor cannot parse CDK tokens at synthesis time

The Skill guided Claude to analyze the error and apply the fix - using Fn.split and Fn.select intrinsic functions instead of JavaScript URL parsing. A deterministic script couldn't diagnose this; it required LLM reasoning.

Phase 4: Bot Review Integration

Amazon Q Security Review flagged several issues:

  • Plaintext KMS keys not cleared from memory after decryption
  • Missing explicit deny policy on KMS key for sensitive operations
  • Potential path traversal vulnerabilities in user ID handling

The workflow's verify-completion.sh hook blocked task completion until all review threads were resolved. Claude addressed each finding with targeted commits:

1# Commit: fix(security): address reviewer bot findings
2- Clear plaintext KMS keys from memory after use
3- Add explicit deny policy to KMS key for PutKeyPolicy, CreateGrant, ScheduleKeyDeletion
4- Add input validation to prevent path traversal attacks (CWE-22)

Phase 5: E2E Testing Discovery

During manual E2E testing on the staging environment, a critical multi-tenancy bug was discovered: User A's secrets were visible to User B. The root cause? OpenHands stored secrets at the S3 bucket root, not in user-scoped paths.

This is exactly the scenario where Skills excel - handling unexpected exceptions. Claude:

  1. Documented the bug in test cases (TC-019, TC-020)
  2. Designed user-scoped storage paths (users/{user_id}/secrets.json)
  3. Implemented S3SecretsStore and S3SettingsStore with proper isolation
  4. Added startup verification to ensure patches were applied correctly

Phase 6: Architecture Refinement

A reviewer suggested replacing API Gateway with ALB Lambda target groups for:

  • Architecture consistency (single entry point)
  • Cost optimization (no API Gateway fees)
  • Lower latency (one less hop)

Claude refactored the entire routing layer, updating Lambda handlers to support ALB event format and modifying CloudFront distribution configuration.

The Delivery Timeline

PhaseCommitsWhat Happened
Initial Implementation1Full feature with tests
CI Fixes2CDK token parsing, test requirements
Security Review3Memory clearing, KMS policy, input validation
E2E Bug Discovery2Multi-tenancy isolation bug found and fixed
Architecture Refactor3API Gateway โ†’ ALB migration
Final Polish13Bedrock compatibility, MCP deduplication, snapshot updates

Total: 24 commits over 2 days, resulting in production-ready code.

Key Insights

  1. Skills handled the unexpected: CI failures, security vulnerabilities, and multi-tenancy bugs all required reasoning and judgment - not scripted responses.

  2. Hooks guaranteed quality gates: Every commit passed through code simplification. Every push triggered PR review. Task completion was blocked until CI passed and review comments were resolved.

  3. The hybrid architecture worked: LLM costs were controlled (orchestration only), while execution was 100% reliable (hooks enforced every gate).

  4. Iterative refinement was automatic: The workflow naturally drove 24 iterations of improvement, each triggered by external feedback (CI, bot reviews, E2E testing).

Implementing the AI Digital Engineer

Step 1: Clone the Workflow Template

1git clone https://github.com/zxkane/claude-code-workflow.git
2cp -r claude-code-workflow/.claude your-project/.claude

Step 2: Configure Hooks

The template includes pre-configured hooks in .claude/settings.json:

 1{
 2  "permissions": {
 3    "allow": ["Bash(.claude/hooks/*)", "mcp__*"]
 4  },
 5  "hooks": {
 6    "PreToolUse": [
 7      {
 8        "matcher": "Bash",
 9        "hooks": [
10          {"type": "command", "command": ".claude/hooks/check-design-canvas.sh"},
11          {"type": "command", "command": ".claude/hooks/check-code-simplifier.sh"},
12          {"type": "command", "command": ".claude/hooks/check-pr-review.sh"},
13          {"type": "command", "command": ".claude/hooks/check-unit-tests.sh"}
14        ]
15      },
16      {
17        "matcher": "Write|Edit",
18        "hooks": [
19          {"type": "command", "command": ".claude/hooks/check-test-plan.sh"}
20        ]
21      }
22    ],
23    "Stop": [
24      {
25        "hooks": [
26          {"type": "command", "command": ".claude/hooks/verify-completion.sh", "timeout": 10}
27        ]
28      }
29    ]
30  }
31}

Step 3: Set Up State Management

The state manager tracks workflow progress persistently:

 1# Mark a step complete
 2.claude/hooks/state-manager.sh mark design-canvas
 3
 4# Check if a step was completed (within 30-minute window)
 5.claude/hooks/state-manager.sh check test-plan
 6
 7# List all completed states
 8.claude/hooks/state-manager.sh list
 9
10# Clear state for re-run
11.claude/hooks/state-manager.sh clear-all

State is stored in .claude/state/ as JSON files with metadata:

1{
2  "action": "design-canvas",
3  "timestamp": "2026-01-31T10:30:00Z",
4  "commit": "abc123",
5  "branch": "feat/user-config",
6  "files": ["docs/design/user-config.pen"]
7}

Step 4: Configure GitHub Actions

Add CI workflow that the completion hook will verify:

 1# .github/workflows/ci.yml
 2name: CI
 3on: [push, pull_request]
 4
 5jobs:
 6  test:
 7    runs-on: ubuntu-latest
 8    steps:
 9      - uses: actions/checkout@v4
10      - run: npm ci
11      - run: npm test
12      - run: npm run build
13
14  review:
15    runs-on: ubuntu-latest
16    permissions:
17      contents: read
18      pull-requests: write
19    steps:
20      - uses: actions/checkout@v4
21      - name: Amazon Q Code Review
22        uses: aws/amazon-q-developer-action@v1
23        with:
24          command: review

Step 5: Start Development

Simply tell Claude what you want to build:

1Design and implement a user authentication system with JWT tokens

The github-workflow skill activates automatically and guides Claude through:

  1. Creating a design canvas
  2. Writing test cases
  3. Implementing the feature
  4. Running reviews and CI
  5. Completing E2E verification

If you need to resume after a break:

1Continue working on the authentication feature

Claude reads the state files and resumes from the appropriate step.

Advanced Patterns

Pattern 1: Spawning Sub-Agents for Parallel Work

Hooks can spawn specialized agents for specific tasks:

 1{
 2  "PostToolUse": [
 3    {
 4      "matcher": "Bash(git push)",
 5      "hooks": [{
 6        "type": "command",
 7        "command": "claude -p 'Run PR review toolkit on current branch' --background"
 8      }]
 9    }
10  ]
11}

This allows:

  • PR review agents to run asynchronously
  • Security scan agents to analyze code in parallel
  • Test coverage agents to report independently

Pattern 2: Conditional Workflow Branches

Skills can define conditional paths based on context:

 1## Workflow Branches
 2
 3If this is a bug fix (branch starts with fix/):
 4- Skip design canvas requirement
 5- Focus on regression test
 6- Expedited review process
 7
 8If this is a security fix:
 9- Require security team review
10- Run additional security scans
11- Notify security channel

Pattern 3: External Tool Integration via MCP

The workflow integrates with external tools through MCP:

  • Pencil MCP: Design canvas creation and validation
  • GitHub MCP: PR management and review queries
  • Chrome DevTools MCP: E2E testing automation
  • AWS MCP: Infrastructure documentation queries

Cost and Performance Analysis

Comparing traditional agent approaches with the AI Digital Engineer:

MetricTraditional AgentAI Digital Engineer
LLM calls per PR50-100+ (every check)10-20 (decisions only)
Cost per feature$5-15$1-3
Verification reliability~80% (LLM may forget)100% (hooks enforce)
Context overflow riskHigh (long workflows)None (state persisted)
Audit trailConversation onlyHooks + Git + CI logs

The hybrid architecture reduces costs by 70-80% while improving reliability from probabilistic to deterministic.

Conclusion

The AI Digital Engineer pattern transforms Claude Code from a coding assistant into an autonomous software engineer. The key insight is separation of concerns:

  • Skills provide intelligent orchestration - handling complex workflows, exceptions, and decisions that require reasoning
  • Hooks provide guaranteed execution - enforcing quality gates without relying on LLM memory or expensive API calls
  • GitHub Actions provide external verification - ensuring standards are met outside the AI's context

This hybrid architecture delivers:

  • Production-ready code that passes bot reviews and security scans
  • Predictable costs by minimizing LLM calls for deterministic operations
  • Complete auditability through persistent state and execution logs
  • Resilient workflows that can resume from any point after interruption

The claude-code-workflow template provides everything you need to implement this pattern. Clone it, configure it for your project, and start delivering software with an AI Digital Engineer.

Resources


Have you implemented AI-assisted development workflows? Share your experiences and patterns in the comments below!