An AI Team That Never Sleeps — Lessons from the MegaDB Multi-Machine Harness

This is a follow-up to the previous post introducing the MegaDB project. For what MegaDB is, see that post. This one covers the architecture and operational experience of the multi-machine AI agent harness used during development.

For the past month and a half, I shipped what would have taken me years to build alone: 249,658 lines of Rust code, 14 crates, 365 source files — all in 38 days. The numbers alone seem implausible. But this isn't a story about code generation. It's about how I organized AI agents as a development team, structured their work, and what I learned about the architecture of AI-driven development.

Three Eras of MegaDB

MegaDB is a database engine written in Rust. It handles both OLTP and OLAP, includes a SQL parser, pgwire protocol, gRPC, Kubernetes operators, vector search, graph queries — a substantial system.

Development unfolded in three distinct eras:

  ERA 1: Solo Dev         ERA 2: Agent Integration       ERA 3: Multi-Machine Harness
  (Feb 24-26)             (Feb 27 - Mar 20)              (Mar 21 - Apr 2)
  ◄───── 3 days ─────►    ◄──────── 22 days ────────►     ◄────── 12 days ──────►
  ┌────────────────────┬────────────────────────────────┬──────────────────────────┐
  │ Project Bootstrap  │ Single-Agent Acceleration      │ 3 Autonomous Agents      │
  │ Types, Schemas     │ Storage → SQL → Compute        │ master-db + app-dev      │
  │ 16 PR              │ → Network                      │ + reviewer               │
  │ 8 PR/day           │ 328 PR, 14.9 PR/day            │ 173 PR, 15.7 PR/day      │
  │ 28K LOC            │ +168K LOC                      │ +54K LOC (all reviewed)  │
  └────────────────────┴────────────────────────────────┴──────────────────────────┘

Era 1 (Feb 24-26): Three days of bootstrapping. I scaffolded nine crates solo, set up types and configuration. About 8 PR/day, 28K LOC total. Standard project startup phase.

Era 2 (Feb 27 - Mar 20): Twenty-two days with Claude Code running as a single agent. This is where exponential growth began. Average 14.9 PR/day, 328 PRs merged, 168K LOC added. The storage engine, SQL parser, compute layer, network server — the bulk of core functionality came from this era. No code review. CI wasn't stable. Speed was there, but without quality gates.

Era 3 (Mar 21 - Apr 2): Twelve days running the multi-machine harness. Three AI agents, each with a specific role, operating autonomously. PR/day stayed at 15.7 — matching Era 2's velocity — but now every PR was reviewed and CI-passed before merge.

The breakthrough: we kept the speed while raising the quality. Normally you pick one. The agent team structure broke that tradeoff.

Composing a 24-Hour AI Team

Here's the architecture:

Machine 0 (Advisor): Strategy layer, handled by me. No code written. Priorities set through GitHub Issues. Orchestrates overall direction.

Machine 1 (master-db): Agent owning five core crates: megadb-core, megadb-storage, megadb-compute, megadb-k8s, megadb-crypto. Focuses on low-level work — SIMD optimization, storage engine, Kubernetes operators.

Machine 2 (app-dev): Agent owning nine upper-layer crates. SQL parser, network server, CLI, auth (OIDC/SAML/SCIM), vector/graph search — application-level features.

Machine 3 (reviewer): Read-only agent. No code written. Performs mechanical validation and six-dimensional architecture review on all PRs. The quality gate itself.

┌───────────────────────────────────────────────────────────────────┐
│                   Machine 0: Advisor (Human)                      │
│       Strategy · Harness Monitoring · Priority Setting            │
│       Code Written: None                                          │
└───────────────────────────────┬───────────────────────────────────┘
                                │ Work assigned via GitHub Issues
                                ▼
┌───────────────────────────────┐   ┌───────────────────────────────┐
│  Machine 1: master-db         │   │  Machine 2: app-dev           │
│                               │   │                               │
│  5 Core Crates:               │   │  9 Application Crates:        │
│  · megadb-core                │   │  · megadb-sql                 │
│  · megadb-storage             │   │  · megadb-network             │
│  · megadb-compute             │   │  · megadb-catalog             │
│  · megadb-k8s                 │   │  · megadb-vector / graph      │
│  · megadb-crypto              │   │  · megadb-llm / fts / onnx    │
│                               │   │  · msql-cli                   │
│  Focus: Storage, SIMD, K8s    │   │  Focus: SQL, API, CLI         │
└───────────────┬───────────────┘   └───────────────┬───────────────┘
                │                                   │
                └─────────────┬─────────────────────┘
                              ▼
               PRs opened to release-candidate branch
                              ▼
┌───────────────────────────────────────────────────────────────────┐
│               Machine 3: Reviewer (Quality Gate)                  │
│     Mechanical Checks → 6-Dim Architecture Review → Verdict       │
│     Code Written: None (Read-Only)                                │
└───────────────────────────────────────────────────────────────────┘

The biggest advantage of this structure is time-independence. While I slept, master-db was reducing cognitive complexity in the storage engine, app-dev was implementing SCIM provisioning, reviewer was reviewing PRs one by one. I'd wake up to stacks of merged PRs and review comments. Human working hours disappeared as a constraint.

Data backs this up. During harness operation, the median issue cycle time was 3.4 hours. Issues went from creation to resolution in less than half a day. Peak days saw 39 PRs merged.

The .claude/ Directory: Blueprint of the Agent System

The harness's magic doesn't lie in GitHub automation—it lies in the .claude/ directory at project root. This is infrastructure code that lets agents understand and coordinate with each other.

.claude/
├── workspace-claude.md           ← Team constitution for the entire system
├── agents/                       ← 7 internal sub-agent prompts
│   ├── database-architect.md     ← Architecture decisions
│   ├── rust-systems-developer.md ← Low-level implementation
│   ├── database-optimizer-architect.md ← Performance optimization
│   ├── data-engineering-specialist.md  ← Data pipelines
│   ├── data-specialist.md        ← Data queries/analysis
│   ├── database-architect-developer.md ← Advanced features
│   └── system-performance-engineer.md  ← System performance
├── prompts/                      ← 4 per-machine prompts
│   ├── machine-0-advisor.md
│   ├── machine-1-master-db.md
│   ├── machine-2-app-dev.md
│   └── machine-3-reviewer.md
├── scripts/                      ← Launch and configuration automation
│   ├── launch.sh                 ← Start multi-agent sessions
│   └── check-deps.sh             ← Enforce crate dependencies
└── templates/                    ← PR/Issue templates
    ├── pull_request.md
    └── issue.md

workspace-claude.md reads like a team constitution. It specifies:

Crate ownership map: which agent owns which crate
Dependency graph: precise relationships between 14 crates
Communication protocol: how work gets assigned via GitHub Issue/PR
Branch strategy: explicit branching like feat/master-db/xyz
Signal file schema: JSON format for emergency inter-agent communication

The four machine prompts spell out exactly what each agent does. When master-db hears "improve WAL recovery logic in megadb-storage," it checks workspace-claude.md, sees "this is my crate," and starts immediately. Four safety mechanisms activate simultaneously:

check-deps.sh: validates dependency direction at every build. If app-dev accidentally modifies megadb-storage, the build fails.
Crate ownership enforcement: CI analyzes file diffs and catches ownership violations.
Reviewer's automatic rejection: applies "Crate ownership violation" label and reverts.
GitHub Branch Protection: release-candidate branch requires reviewer approval to merge.

Stacked layers mean agents can't step on each other's code. This is critical for autonomous systems—humans can say "sorry" and negotiate. AI struggles with that. During the entire harness period: zero merge conflicts.

Why One Machine Per Session Matters

The most crucial design decision: each agent gets an independent machine (session).

Why does this matter? Usually with AI coding assistants, one session handles everything. But that approach has fundamental limits.

First: context isolation. When master-db is fixing WAL implementation and app-dev modifies the SQL parser in the same session, context gets tangled. Separate sessions mean each agent sees only their crates. master-db always loads only megadb-storage and megadb-compute—no irrelevant context competition.

Second: parallelism. One session = one task at a time. Separate machines? master-db works on SIMD vectorization while app-dev builds the gRPC server. The workload shift pattern shows this naturally. Early: master-db builds foundations. Later: app-dev integrates on top. No explicit coordination—just the natural dependency flow.

100%┤  ▒▒▒  ▒▒▒  ░▒▒        ▒▒▒  ▒▒▒  ▒▒▒  ▒▒▒  ▒▒▒  ▒▒▒  ▒▒▒
    │  ▒▒▒  ■■■  ■▒▒  ■■■  ■▒▒  ▒▒▒  ▒▒▒  ▒▒▒  ▒▒▒  ░░░  ░░░
    │  ▒▒▒  ■■■  ■▒▒  ■■■  ■■▒  ■■▒  ■■▒  ░░▒  ░░▒  ░░░  ░░░
    │       ■■■       ■■■  ■■■  ■■■  ■■■  ░░░  ░░░  ░░░  ░░░
    │       ■■■              ■■■  ■■■  ■■■  ░░░  ░░░
    │       ■■■              ■■■  ■■■  ■■■  ░░░  ░░░
  0%┤───────────────────────────────────────────────────────────
    Mar21  22   23   25   26   27   28   29   30   31  Apr1

    ■ master-db    ░ app-dev    ▒ reviewer

    ◄── master-db lead ──►◄── transition ──►◄── app-dev lead ──►

This pattern emerged naturally. Low-level crates (storage, compute) finish first; higher crates (SQL, network, CLI) integrate on top. No explicit coordination—just focused agents on their assigned crates.

Third: crate ownership enforcement. Each agent modifies only assigned crates. Eliminates merge conflicts at the root. Critical for autonomous systems where recovery is hard. Harness period: zero merge conflicts.

Fourth: failure isolation. One agent's problem doesn't cascade. The harness ran 2-Machine for a day when needed; other agents kept working unaffected.

By harness end, contributions balanced evenly:

Agent	PRs	Lines Added	Lines Deleted	Focus Area
master-db	57 (33%)	+18,028	-5,838	Storage, compute, K8s, SIMD
app-dev	58 (34%)	+13,277	-4,765	SQL, network, CLI, auth, tests
reviewer	23 (13%)	+11,138	-7,964	Mechanical fixes from review
other	35 (20%)	+17,961	+640	Cross-agent merges, harness setup

Reviewer added 11K LOC despite writing no new features — mechanical fixes found during review (lint violations, formatting).

The Develop-Deploy-Test Cycle

The harness runs differently from standard CI/CD. Code doesn't auto-test on push. Instead: explicit deploy-test cycles.

1. Agent completes work on feat branch
2. PR opened to release-candidate → reviewer auto-assigned
3. After reviewer approval: rebase & merge
4. release-candidate deploys to Kubernetes (CircleCI job)
5. E2E tests run automatically
   - 4 storage engines validated (MEMORY, OLTP, OLAP, TIMESERIES)
   - 100+ SQL functions tested
   - Distributed query tests
   - Stress tests
6. Tests pass → next agent picks up latest code

This cycle moves fast because each agent modifies only their domain. master-db changes storage API; app-dev sees it, adjusts quickly. Clear dependencies. Defined crate ownership.

The main deployment branch is actually release-candidate. main is reserved for validated stable points. All agents target release-candidate. Separates development from stabilization. Keeps "deployable any moment" state viable.

Test Infrastructure: From E2E to Fuzz Testing

MegaDB has comprehensive test coverage.

Unit Tests: 1,876 unit tests, all passing. Not just line coverage—core logic validated per crate.

Integration Tests: 13 integration test files:

http_e2e.rs: HTTP API endpoint testing
money_e2e.rs: Financial calculation accuracy
cross_engine_integration.rs: 4-engine compatibility
wal_recovery.rs: WAL recovery protocol
columnar_integration.rs: Columnar storage queries
row_integration.rs: Row storage queries
builder_integration.rs: K8s operator pipeline
encrypt_cache_integration.rs: Encryption cache consistency
And more

Fuzz Tests: 4 fuzz targets:

fuzz_sql_parser: SQL parser fuzzing (no crashes on arbitrary input)
fuzz_check_constraint_parser: Constraint parser
fuzz_optimizer_pipeline: Query optimization
fuzz_pgwire_queries: PostgreSQL wire protocol

Smoke Tests: 60KB smoke_test_v070.sh script runs post-deployment in K8s. Validates all 4 storage engines, 100+ SQL functions, graph queries, vector search in one comprehensive pass.

Stress Tests: Long-running high-load scenarios validate no memory leaks or deadlocks.

Chaos Tests: K8s pods randomly killed, networks cut, disks filled. Simulates failure modes.

Benchmark & Baseline Tests: Performance tracking and regression detection.

How did such thorough testing emerge? Simple: reviewer mechanically enforced "tests accompany every PR." No tests? Auto-requested changes. This single pattern created comprehensive coverage.

Communication Structure and Resolving Bottlenecks

The harness communicates exclusively through GitHub. Issues assign work. PRs deliver results. Comments provide feedback. All recorded. No separate monitoring dashboard or Slack thread—just Issue/PR history reveals the whole project state.

Orchestration: machine-0-advisor's 120-Second Cycle

Machine 0 doesn't just create Issues. It polls system state every 120 seconds. Detects bottlenecks. Identifies idle agents. Adjusts priorities. Assigns work. Fast feedback loops mean the system self-optimizes.

Example: Mar 31 threatened 39 PRs merging in one day. Reviewer became bottleneck. Advisor detected this and:

Closed/postponed lower-priority issues
Throttled master-db and app-dev pace
Requested automation scripts to assist reviewer

Immediate response became possible.

Reviewer Bottleneck and Automated Validation

On Mar 31, reviewer faced 39 PRs to process as a single agent. The key: code quality didn't drop under load. Instead, automation absorbed the pressure:

check-deps.sh: crate dependency validation auto-runs in CI
SonarQube: 85 static analysis rules applied automatically
Snyk: dependency vulnerabilities scanned automatically
Clippy + fmt: code style enforced in CI

Reviewer focused only on architecture review for PRs that passed all automated checks. Lint and formatting were already done.

Signal File System (Deprecated)

Early design included .claude/agent-messages/ with JSON files for inter-agent signals. Turned out useless. GitHub Issue/PR sufficed. Signal files assumed urgent communication needs. Reality: almost never happened. Most coordination ran through GitHub's automatic workflows.

What Still Needs Improving

Reviewer bottleneck: Peak moments (like Mar 31) overload single reviewer. Solutions exist:

Multiple reviewers: 2-3 reviewer instances, distributed by round-robin or crate ownership
Layered review: CI handles automated validation; agents handle architecture only

No real-time inter-agent communication: When master-db changes storage API, app-dev doesn't learn until next Issue assignment. Tradeoff for merge-conflict prevention via ownership. Long-term: might need event bus mechanisms.

Context window limits: Growing projects make it hard to load entire crate sets. master-db's 5 crates already substantial. Long-term: RAG-based context management might be necessary.

Functional quality validation: SonarQube and Snyk check formal quality. But do queries return correct results? Does concurrency maintain data consistency? Different dimension entirely. Fuzz and property-based tests need strengthening.

What the Numbers Say

┌─────────────────────────────────────────────────────────────┐
│            MegaDB Repository Activity Summary               │
│            38 Days (Feb 24 - Apr 2)                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  590 PRs Merged                537 Issues Created           │
│  ████████████████████████     ████████████████████          │
│  Average 15.5 PR/day          520 Issues Closed (97%)       │
│                                                             │
│  +384,175 Lines Added          -48,684 Lines Deleted        │
│  ████████████████████████     ██████                        │
│  +335,491 Net Growth           38 Active Days               │
│                                                             │
│  Codebase: 1,799 → 249,658 Rust LOC (+13,778%)              │
│  ████████████████████████████████████████████████████       │
│  43 → 365 .rs Files            9 → 14 Crates                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

PR Throughput by Week

 PRs
 170 ┤                      █
     │                      █
 160 ┤                      █  161
     │                      █
     │                      █
 100 ┤                      █           █
     │                      █           █  98
  90 ┤                      █           █
     │                81    █           █
  80 ┤                █     █           █
     │                █     █     74    █
  70 ┤    59          █     █     █     █
     │    █     44    █     █     █     █    
  60 ┤    █     █     █     █     █     █
     │    █     █     █     █     █     █
  50 ┤    █     █     █     █     █     █
     │    █     █     █     █     █     █
  40 ┤    █     █     █     █     █     █
     │    █     █     █     █     █     █
     └────┴─────┴─────┴─────┴─────┴─────┴──
         W08   W09   W10   W11   W12   W13
        2/24   3/3  3/10  3/17  3/24  3/31

Week	Period	PRs
W08	2/24~	59
W09	3/3~	44
W10	3/10~	81
W11	3/17~	161
W12	3/24~	74
W13	3/31~	98

Highest week: W11 (Mar 17-23) with 161 PRs. Coincided with K8s/hardening sprint plus harness launch.

Era Comparison

Metric	Era 1 — Bootstrap (3 days)	Era 2 — Single Agent (22 days)	Era 3 — Multi-Machine (12 days)
PR/day	5.3	14.9	15.7
LOC/day	4,242	7,632	5,316
Issues/day	—	19.9	8.3

Key insight: Era 2 to Era 3, PR/day stayed flat (14.9 → 15.7) but LOC/day dropped (7,632 → 5,316). Why? Harness period focused on hardening. Less new features; more SonarQube fixes, cognitive complexity reduction, test coverage improvement. This rewrites code rather than adding it—lower LOC growth, higher quality.

Quality vs Speed

Gate	Era 1	Era 2	Era 3 (Harness)
PR Code Review	None	None	100%
Crate Ownership	None	None	Enforced
CI Pipeline	None	Broken	5/5 Passing
SonarQube	None	None	85 Rules Applied
Snyk Scanning	None	None	Enabled
Dependency Check	None	None	check-deps.sh
Lint Policy	None	None	unwrap/expect/panic Blocked
Issue Tracking	None	Weak	Complete: Create→Close 3.4hrs

This table says one thing: same speed, all quality gates added. Usually adding quality means 30-50% slower. Multi-agent team absorbed the overhead by separating reviewer to dedicated machine and maximizing automation.

Codebase Growth

 LOC
 250K ┤                                                        ●───● 249,658
      │                                                    ●──┘
 240K ┤                                                ●──┘
      │                                            ●──┘
 230K ┤                                        ●──┘
      │                                    ●──┘
 220K ┤                                ●──┘
      │                            ●──┘
 200K ┤                        ●──┘
      │                    ●──┘  ← 3/20: 195K
 190K ┤                ●──┘
      │            ●──┘
 140K ┤        ●──┘
      │    ●──┘          ← 3/15: 138K
 120K ┤●──┘
      │●                 ← 3/1: 80K
  80K ┤│                 ← 2/28: 57K
      ││
  10K ┤●                 ← 2/26: 11K
   2K ┤●                 ← 2/24: 1.8K
      └─┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴──
       2/24 2/28  3/1  3/7 3/12 3/15 3/20 3/22 3/25 3/28 3/30  4/2
                                          ▲
                                    Harness Started

The curve shows harness softening growth slightly. Not "slowed down"—shifted to more sustainable pace. Early explosion was necessary. But for long-term maintenance, current speed is healthier.

Issue Cycle Time

Metric	Value
Median	3.4 hours
Mean	11.6 hours
Fastest	Under 1 minute
Slowest	69.5 hours
Closure Rate	97% (520/537)

Most issues close within a working session. Request feature; responsible agent completes in 2-3 hours; reviewer approves in 30 minutes. Fast feedback loops prevent problem accumulation.

Closing Thoughts

This project taught me there's a massive difference between "AI as a faster code generator" and "AI organized as a team." Single agent shows overwhelming throughput versus humans. Multi-agent harness maintains throughput while gaining quality, traceability, and failure isolation simultaneously.

More interesting: this structure self-heals automatically. Reviewer becomes bottleneck; advisor auto-throttles agents. One agent slows; others pick up slack. System optimizes itself.

Plenty of road ahead. Real-time inter-agent coordination. E2E test automation. Functional quality validation. Context management. But building a production database engine in 38 days shows clearly where software development heads.

You sleep. Code ships. Reviews run. Issues close. That's what a 24-hour AI team means.

All figures and charts in this post derive from actual MegaDB repository activity data. Full project: 38 days, 590 PRs, 537 issues, 249,658 Rust LOC across 14 crates The release-candidate branch is the primary development line.