All Posts
· 17 min read

Building Orch: A CLI Orchestrator for Multi-Agent Claude Code

Claude Code is powerful but singular. One session, one context window, one task. The moment you need parallel work, persistent autonomous agents, or inter-agent communication, you’re manually managing terminals. I built orch to fill that gap.

It’s a Go CLI that coordinates multiple Claude Code instances via tmux. You spin up named agents with roles, they communicate through files, schedule their own follow-ups, and you monitor everything from a live terminal dashboard. One binary, no dependencies beyond tmux and claude.

Here’s how I thought about the design.

The core insight: tmux is the runtime

The first decision was the most important: don’t build a process manager. tmux already does persistent terminal sessions better than anything I could write. Each agent is just a named window inside a single tmux session called orch.

orch up builder --role engineer --dir ~/project --spec specs/task.md

This creates a tmux window, starts claude inside it, and injects the agent’s identity and team awareness via --append-system-prompt. No containers, no daemons, no socket servers. The agent runs in a real terminal with full interactive capability. You can attach to it anytime and see exactly what it’s doing.

// The tmux wrapper is dead simple. Just exec.Command calls.
func (c *Client) NewWindow(session, name, dir string) error {
    cmd := exec.Command("tmux", "new-window", "-t", session, "-n", name, "-c", dir)
    out, err := cmd.CombinedOutput()
    if err != nil {
        return fmt.Errorf("creating window %q: %s: %w", name, out, err)
    }
    return nil
}

func (c *Client) SendKeys(session, window, text string) error {
    target := session + ":" + window
    if strings.Contains(text, "\n") {
        return c.sendMultiline(target, text)
    }
    cmd := exec.Command("tmux", "send-keys", "-t", target, text, "Enter")
    // ...
}

The tradeoff is that everything depends on tmux being alive. If tmux dies, your agents die. For my use case (running on my dev machine or a persistent server) that’s perfectly fine. The orch watch command can auto-restart dead agents if you need more resilience.

SQLite as the message bus

I needed three things: agent registration, message history, and scheduled tasks. All three are simple structured data that benefits from queryability. SQLite was the obvious choice, and modernc.org/sqlite gives me a pure Go driver with no CGO, which means orch compiles to a single binary on any platform.

CREATE TABLE agents (
    name TEXT UNIQUE NOT NULL,
    role TEXT NOT NULL,
    dir TEXT NOT NULL,
    tmux_session TEXT NOT NULL,
    tmux_window TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'running',
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    last_activity DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE messages (
    from_source TEXT NOT NULL,
    to_agent TEXT NOT NULL,
    content TEXT NOT NULL,
    delivered INTEGER NOT NULL DEFAULT 0,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    delivered_at DATETIME
);

CREATE TABLE schedule (
    agent_name TEXT NOT NULL,
    run_at DATETIME NOT NULL,
    note TEXT NOT NULL,
    executed INTEGER NOT NULL DEFAULT 0,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Messages are recorded before delivery and marked delivered after the tmux send-keys succeeds. This gives you a complete audit trail. After a run, orch logs builder shows every message that agent received, who sent it, and when. Useful for debugging why an agent did (or didn’t do) something.

The delivery flow in the messenger:

func (m *Messenger) Send(from, agentName, content string) error {
    agent, err := db.GetAgent(m.DB, agentName)
    if err != nil {
        return fmt.Errorf("looking up agent: %w", err)
    }
    if agent.Status != "running" {
        return fmt.Errorf("agent %q is not running (status: %s)", agentName, agent.Status)
    }

    // Record first, deliver second, mark delivered third.
    msgID, err := db.InsertMessage(m.DB, from, agentName, content)
    if err != nil {
        return fmt.Errorf("recording message: %w", err)
    }
    if err := m.Tmux.SendKeys(agent.TmuxSession, agent.TmuxWindow, content); err != nil {
        return fmt.Errorf("delivering via tmux: %w", err)
    }
    db.MarkMessageDelivered(m.DB, msgID)
    db.TouchAgent(m.DB, agentName)
    return nil
}

File-based inter-agent communication

This was the trickiest design decision. Agents need to talk to each other, but they don’t have CLI access to orch. They’re Claude Code sessions. They can read files, write files, and run commands. So I leaned into that.

An agent sends a message by creating a file:

# Agent "builder" wants to message "reviewer"
echo "Please review my changes" > .orch-send-reviewer

A background scheduler polls each agent’s working directory every 10 seconds. When it finds .orch-send-* files, it reads the content, delivers it via the messenger (SQLite record + tmux delivery), and deletes the file. Same pattern for .orch-schedule, where agents schedule their own follow-up messages:

# Agent schedules a self-reminder in 10 minutes
echo "10 Check if tests are passing" > .orch-schedule

The scheduler parses the minutes, inserts a row into the schedule table with the future timestamp, and deletes the file. When the time comes, the scheduler delivers the note as a regular message.

func (s *Scheduler) processScheduleFile(agent db.Agent) {
    path := filepath.Join(agent.Dir, ".orch-schedule")
    content, err := os.ReadFile(path)
    if err != nil {
        return // File doesn't exist, normal.
    }

    parts := strings.SplitN(strings.TrimSpace(string(content)), " ", 2)
    minutes, _ := strconv.Atoi(parts[0])
    note := parts[1]
    runAt := time.Now().Add(time.Duration(minutes) * time.Minute)

    db.InsertSchedule(s.DB, agent.Name, runAt, note)
    os.Remove(path)
}

This is deliberately low-tech. No sockets, no IPC, no watching stdin. Files are the universal interface that every tool understands. Claude Code can create them with a simple Write tool call, and the polling overhead is negligible.

The multiline paste problem

The first real bug was a fun one. When the scheduler delivered a multi-line message (like review feedback) to an agent via tmux send-keys, the text arrived garbled. tmux sends each character individually, and newlines in the middle of the text triggered premature submissions in Claude Code’s input handler.

The fix: detect newlines in the message and switch to a different delivery path. Write the text to a temp file, use tmux load-buffer to load it into tmux’s paste buffer, tmux paste-buffer to paste it as a single atomic operation, wait 500ms for Claude Code to process the paste, then send Enter to submit.

func (c *Client) sendMultiline(target, text string) error {
    // Write to temp file for atomic paste.
    tmpFile, _ := os.CreateTemp("", "orch-msg-*.txt")
    defer os.Remove(tmpFile.Name())
    tmpFile.WriteString(text)
    tmpFile.Close()

    // Load into tmux buffer and paste atomically.
    exec.Command("tmux", "load-buffer", tmpFile.Name()).Run()
    exec.Command("tmux", "paste-buffer", "-t", target).Run()

    // Brief pause to let Claude Code process the paste.
    time.Sleep(500 * time.Millisecond)

    // Send Enter to submit.
    exec.Command("tmux", "send-keys", "-t", target, "Enter").Run()
    return nil
}

Single-line messages still use the simpler send-keys path. The branching happens in SendKeys:

func (c *Client) SendKeys(session, window, text string) error {
    target := session + ":" + window
    if strings.Contains(text, "\n") {
        return c.sendMultiline(target, text)
    }
    cmd := exec.Command("tmux", "send-keys", "-t", target, text, "Enter")
    // ...
}

The trust dialog workaround

Claude Code has a “do you trust this folder?” prompt that blocks non-interactive startup. --dangerously-skip-permissions doesn’t bypass it (despite the name). The -p flag does, but makes the session non-interactive.

The solution: orch up reads ~/.claude.json, adds the agent’s working directory to the projects map with hasTrustDialogAccepted: true, and writes it back. This is the same file Claude Code itself writes to when you click “Yes, I trust this folder.” On macOS, this requires resolving symlinks first (/tmp to /private/tmp), because Claude resolves the real path and won’t find a match otherwise.

func trustDirectory(dir string) error {
    home, _ := os.UserHomeDir()
    claudeJSONPath := filepath.Join(home, ".claude.json")

    data := make(map[string]any)
    raw, err := os.ReadFile(claudeJSONPath)
    if err == nil {
        json.Unmarshal(raw, &data)
    }

    projects, ok := data["projects"].(map[string]any)
    if !ok {
        projects = make(map[string]any)
        data["projects"] = projects
    }

    proj, ok := projects[dir].(map[string]any)
    if !ok {
        proj = make(map[string]any)
        projects[dir] = proj
    }
    proj["hasTrustDialogAccepted"] = true

    out, _ := json.MarshalIndent(data, "", "  ")
    return os.WriteFile(claudeJSONPath, out, 0o644)
}

Not elegant, but it works. I’d love a proper --trust flag in Claude Code someday.

System prompt injection over CLAUDE.md

My first approach was generating a CLAUDE.md file in each agent’s working directory with its identity and teammate list. This broke immediately when two agents shared the same directory. The second agent’s CLAUDE.md clobbered the first.

The fix was obvious in retrospect: use --append-system-prompt to inject the agent identity directly into the session. No files to manage, no cleanup on teardown, no conflicts.

The system prompt is built from a Go template at agent startup:

const systemPromptTemplate = `You are an autonomous agent managed by orch.
Your name is "{{.Name}}" and your role is "{{.Role}}".
{{if .Teammates}}
## Team
Other agents currently running:
{{range .Teammates}}- "{{.Name}}" ({{.Role}})
{{end}}
## Inter-agent Communication
To send a message to another agent, create a file named
.orch-send-<agent-name> in your working directory with the
message content. The orchestrator will pick it up and deliver it.
{{end}}
To schedule a follow-up task for yourself, create a file named
.orch-schedule with the format:
<minutes> <note describing what to do>

Stay focused on your assigned role.`

The claude command gets assembled with the prompt shell-escaped and passed as a flag:

func (m *Manager) buildClaudeCmd(opts UpOpts, systemPrompt string) string {
    var parts []string
    parts = append(parts, "claude")
    if opts.SkipPermissions {
        parts = append(parts, "--dangerously-skip-permissions")
    }
    escaped := shellEscape(systemPrompt)
    parts = append(parts, "--append-system-prompt", escaped)
    return strings.Join(parts, " ")
}

The PM/engineer/reviewer pattern

The most effective multi-agent pattern I’ve found is a three-role team:

The PM role is critical. Without it, the engineer finishes and nobody notices. The reviewer waits forever. With a PM scheduling check-ins, the system is self-correcting. The PM detects completion, triggers review, and relays feedback.

The key insight: agents with narrow roles outperform agents with broad roles. A PM that also writes code will get distracted. An engineer that also reviews will skip the review. Separation of concerns applies to AI agents just as much as it does to code.

The scheduler: boring and reliable

I avoided anything clever with the scheduler. It’s a polling loop with two tickers and a git commit watcher:

func (s *Scheduler) Run(ctx context.Context, scheduleInterval, fileInterval time.Duration) {
    scheduleTicker := time.NewTicker(scheduleInterval)  // 30s
    fileTicker := time.NewTicker(fileInterval)          // 10s

    s.RunOnce() // Run immediately on start.

    for {
        select {
        case <-ctx.Done():
            return
        case <-scheduleTicker.C:
            s.processDueSchedules()
        case <-fileTicker.C:
            s.processAgentFiles()
            s.processGitCommits()

            // Auto-exit when no running agents remain.
            agents, _ := db.ListAgents(s.DB, "running")
            if len(agents) == 0 {
                idleTicks++
                if idleTicks >= 3 {
                    return
                }
            } else {
                idleTicks = 0
            }
        }
    }
}

It auto-starts as a background daemon when you run orch up and auto-exits after about 30 seconds with no running agents. Logs go to ~/.orch/scheduler.log.

I considered file watchers (fsnotify), WebSocket connections, and Unix domain sockets. Polling won because it’s debuggable, restartable, and has zero edge cases around file system event ordering. The 10-second latency is unnoticeable in practice.

One thing I learned the hard way: background daemons need robust cleanup. My first implementation used a PID file, but the scheduler would survive orch reset and keep running, spewing errors into the terminal because the tmux session was gone. The fix was belt-and-suspenders: PID file for the happy path, pkill as a fallback, and redirecting all scheduler output to a log file so nothing ever leaks to the user’s terminal.

func stopScheduler() {
    // Try PID file first.
    if data, err := os.ReadFile(pidFile); err == nil {
        pid, _ := strconv.Atoi(strings.TrimSpace(string(data)))
        proc, _ := os.FindProcess(pid)
        proc.Signal(syscall.SIGTERM)
    }
    os.Remove(pidFile)

    // Also pkill any stragglers.
    exec.Command("pkill", "-f", "orch scheduler").Run()
}

Idle detection

Early on, I kept having to attach to agents just to figure out if they were working or done. The dashboard showed “running” for everything, which was useless.

The fix: capture the last few lines of each agent’s tmux pane and look for Claude Code’s prompt character. If it’s there, the agent is idle. If not, it’s actively generating.

func isIdle(paneOutput string) bool {
    lines := strings.Split(strings.TrimRight(paneOutput, "\n"), "\n")
    for i := len(lines) - 1; i >= 0 && i >= len(lines)-5; i-- {
        line := strings.TrimSpace(lines[i])
        if line == "" {
            continue
        }
        if strings.Contains(line, "❯") {
            return true
        }
        // Hit real content, agent is working.
        if len(line) > 0 {
            return false
        }
    }
    return false
}

The dashboard now shows “running” in green when an agent is working, and “idle” in yellow when it’s sitting at the prompt. Simple heuristic, but it saved me more manual checking than almost any other feature.

The dashboard

The dashboard is the main way I monitor what’s happening. It’s a bubbletea TUI that refreshes every 3 seconds, showing all agents with their status and a live preview of the selected agent’s terminal output.

The interesting design choice was using tea.ExecProcess for the attach feature. When you press Enter on an agent, the dashboard suspends itself, runs tmux attach-session as a child process, and resumes when you detach (Ctrl-B d). This means you can drop into any agent’s full interactive terminal and come back to the dashboard without losing state.

func (m model) attachToAgent(a agent.AgentStatus) tea.Cmd {
    _ = m.tmux.SelectWindow(a.Agent.TmuxSession, a.Agent.TmuxWindow)
    c := exec.Command("tmux", "attach-session", "-t", a.Agent.TmuxSession)
    return tea.ExecProcess(c, func(err error) tea.Msg {
        return attachDoneMsg{}
    })
}

The preview pane captures the last 20 lines of the selected agent’s tmux output via tmux capture-pane. This is where idle detection becomes visible. You can see at a glance whether an agent is mid-generation (scrolling output) or sitting at the prompt (yellow “idle” status), and the preview shows you what it last did without having to attach.

The dashboard also runs the scheduler in the background as a goroutine, so scheduled messages and inter-agent file communication work automatically while you’re watching. Everything you need in one terminal.

Git commit watcher

The PM agent’s fixed-interval check-in had an annoying gap. The engineer would finish and commit, but the PM wouldn’t notice until its next scheduled wake-up (8-10 minutes later). The reviewer would sit idle the entire time.

The scheduler now tracks the last known commit hash per directory. Every 10 seconds, it runs git rev-parse HEAD and compares. When it detects a new commit, it immediately notifies all PM-role agents in that directory:

func (s *Scheduler) processGitCommits() {
    agents, _ := db.ListAgents(s.DB, "running")

    // Group by directory, find PMs and builders.
    dirs := make(map[string]*dirInfo)
    for _, a := range agents {
        // ... group agents by dir, track PMs vs builders
    }

    for dir, di := range dirs {
        if len(di.pms) == 0 || len(di.builders) == 0 {
            continue
        }

        cmd := exec.Command("git", "-C", dir, "rev-parse", "HEAD")
        out, err := cmd.Output()
        hash := strings.TrimSpace(string(out))

        prev, seen := s.lastCommits[dir]
        s.lastCommits[dir] = hash
        if !seen || hash == prev {
            continue
        }

        // Get commit message for context.
        cmd = exec.Command("git", "-C", dir, "log", "--oneline", "-1")
        msgOut, _ := cmd.Output()

        // Notify all PMs.
        for _, pm := range di.pms {
            s.Messenger.Send("git-watcher", pm.Name,
                fmt.Sprintf("New commit detected: %s", strings.TrimSpace(string(msgOut))))
        }
    }
}

This turned a “finish and wait 8 minutes” gap into a “finish and get feedback in 10 seconds” loop.

Spec generation: the hardest part automated

After running orch on a few projects, a pattern became clear: writing the specs was the bottleneck. The orchestration worked great once the specs existed, but creating three detailed, stack-aware spec files for each task took 30-60 minutes of manual work. Vague specs produced vague results. Detailed specs produced great results. So I automated the detailed part.

orch specgen is a subcommand that analyzes a target codebase and generates all three role specs in one shot:

orch specgen --dir ~/workspace/myproject --task "Add user authentication with JWT"
# Output: myproject/specs/engineer.md, pm.md, reviewer.md

The design splits the work into two phases. The first phase is deterministic — no LLM involved. It scans the project to build a structured analysis: tech stack detection from config files (go.mod, package.json, Cargo.toml), project structure via directory walking, git state, and existing documentation. This is fast, reproducible, and free.

func Analyze(dir string) (*Analysis, error) {
    a := &Analysis{Dir: absDir}
    a.Stack = DetectStack(absDir)      // go.mod → Go, package.json → Node, etc.
    a.Structure = MapStructure(absDir)  // key files, test files, directory tree
    a.Git = CollectGitInfo(absDir)      // branch, recent commits, uncommitted changes
    a.Documentation = collectDocs(absDir) // README.md, CLAUDE.md
    return a, nil
}

Stack detection is a priority chain of file existence checks. A go.mod means Go with go test ./... and go vet ./.... A package.json gets parsed for scripts and framework dependencies — it knows the difference between an Astro project and a React one, and picks the right build/test commands accordingly. Framework configs (astro.config.mjs, next.config.ts, tailwind.config.*) add further context.

The second phase feeds the analysis and the user’s task description to Claude via claude -p (print mode), which runs non-interactively and returns the result to stdout. One call per role, each with a role-specific system prompt that encodes orch’s conventions:

func callClaude(ctx context.Context, systemPrompt, userPrompt string) (string, error) {
    cmd := exec.CommandContext(ctx, "claude", "-p",
        "--system-prompt", systemPrompt,
        "--output-format", "text",
    )
    cmd.Stdin = strings.NewReader(userPrompt)
    output, err := cmd.Output()
    return strings.TrimSpace(string(output)), err
}

Using claude -p instead of the Anthropic API directly was a deliberate choice. Users already have Claude Code installed (it’s a prerequisite for orch), so there’s no API key to manage, no SDK dependency, and model selection is inherited from the user’s existing config. The generation code is just exec.Command — the simplest possible integration.

The system prompts are where the opinions live. The engineer prompt demands commit-after-each-phase discipline, exact file paths, and “do not move to the next step until tests pass.” The PM prompt requires specific check-in intervals with .orch-schedule, concrete gate criteria (“run go test ./... and verify 0 failures”), and the cardinal rule: “Do NOT write code.” The reviewer prompt enforces the MUST FIX / SHOULD FIX / NIT format with file and line citations.

Early versions had a problem where the LLM would generate generic specs that could apply to any project. The fix was feeding it the full analysis — actual file paths, actual dependencies, actual test patterns from the codebase. When it knows the project has internal/db/queries_test.go using table-driven tests, it generates specs that follow the same pattern. When it sees go.mod with chi and sqlite, the engineer spec includes the right imports and the reviewer spec checks for SQL injection and unclosed rows.

You can also run just the analysis to verify what specgen sees before generating:

orch specgen --dir ~/workspace/myproject --analyze

This prints the detected stack, project structure, git state, and documentation — useful for debugging when the generated specs don’t reference the right files.

The stack

The whole thing is about 3,000 lines of Go. It does one thing (coordinate Claude Code instances) and tries to do it well.

github.com/jeffdhooton/orch