All Posts
· 7 min read

Forking gstack to Close the Plan-to-Execution Gap

In the last post, I wrote about building orch — a Go CLI that coordinates multiple Claude Code instances via tmux. It works well once you have specs. But getting from “idea” to “specs” to “running agents” still required manual steps: write a plan, review it, manually translate it into role-specific spec files, then run orch up for each agent.

Meanwhile, gstack — Garry Tan’s open-source skill framework for Claude Code — had already solved the planning side. Skills like /plan-ceo-review and /plan-eng-review produce structured, reviewed plans with real artifacts. The problem was that these plans had nowhere to go. They’d sit in a markdown file until someone manually executed them.

So I forked gstack to make it orch-aware. The goal: a reviewed plan flows directly into parallel agent execution with zero manual translation.

What gstack already does well

gstack is a collection of 21+ skills that turn Claude Code into a structured workflow. The ones relevant here are the planning skills:

These skills produce artifacts — review logs, test plans, architectural decisions — stored in ~/.gstack/projects/. They’re opinionated and thorough. After running both on a task, you have a plan that’s been stress-tested from both the product and engineering angles.

But then what? You copy-paste sections into spec files, manually adapt them for each agent role, and run the orch commands yourself. That handoff was the bottleneck.

Three changes to close the gap

The fork touches surprisingly little code. Nine files changed. The modifications fall into three categories.

1. The /orch skill

The biggest addition is a new skill that bridges gstack’s planning output into orch’s execution model. When you run /orch, it:

  1. Checks that orch is installed
  2. Looks for existing gstack artifacts — review logs, test plans, CEO review output
  3. Decides what to do based on context

If agents are already running, it offers to attach, send messages, or tear down. If a reviewed plan exists, it generates specs and launches agents. If unreviewed plans exist, it nudges you to run /plan-eng-review first. If nothing exists, it asks what you want to build.

The interesting part is the spec generation step. It calls orch specgen under the hood, which analyzes the codebase (tech stack, project structure, existing tests) and feeds that analysis along with the plan to Claude in print mode. Out come three role-specific specs — engineer, PM, reviewer — that reference actual file paths, actual test patterns, and actual dependencies from your project.

# The flow looks like this:
/plan-ceo-review  →  plan.md (reviewed, stress-tested)
/plan-eng-review  →  plan.md (architecture locked in)
/orch             →  specs generated → agents launched

Before the fork, steps 1-2 happened in gstack and step 3 happened manually. Now it’s one continuous flow.

2. Execution handoffs in the review skills

The subtler change: both /plan-ceo-review and /plan-eng-review now have an “Execution Handoff” section that runs after the review completes. If orch is installed and the task is substantial enough — roughly 8+ files or 2+ implementation phases — the skill offers to spin up orch agents right there.

This matters because it catches the moment of highest intent. You just finished reviewing a plan. You’re mentally committed. The handoff says: “This looks like a multi-agent task. Want me to generate specs and launch agents?” One confirmation and you’re running.

For smaller tasks, it stays silent. Not everything needs three parallel agents. A two-file bug fix is faster with a single Claude session.

3. Keeping the fork alive

The boring but necessary piece: a GitHub Actions workflow that syncs with upstream every 6 hours. Garry and contributors are actively developing gstack, and I don’t want to maintain a divergent fork.

The workflow fetches garrytan/gstack main and attempts a clean merge. If it succeeds, it pushes directly. If there are conflicts — which almost always hit the orch-specific sections — it uses the Claude API to resolve them automatically.

# The conflict resolution prompt tells Claude exactly what to preserve
- name: AI-assisted conflict resolution
  run: |
    claude -p --system-prompt "You are resolving git merge conflicts
    in a fork of gstack. The fork adds orch integration sections
    to plan-ceo-review and plan-eng-review skills, and changes
    update URLs to point to jeffdhooton/gstack. Preserve all
    fork-specific changes while accepting upstream improvements."

If Claude resolves everything cleanly, it opens a PR for review. If it can’t, it opens a PR with conflict markers for manual resolution. In practice, the conflicts are predictable — they’re always in the same few sections — so Claude handles them reliably.

This is a pattern I’d use for any opinionated fork: automate the merge, let AI handle the predictable conflicts, escalate the weird ones. The fork stays current with upstream while preserving local modifications.

Why this matters more than it looks

The individual changes are small. But the compound effect is significant.

Before the fork, the workflow was:

  1. Think about what to build (unstructured)
  2. Maybe write a plan (optional, often skipped)
  3. Write 3 spec files manually (30-60 minutes, tedious)
  4. Run orch up commands (copy-paste from memory)
  5. Monitor and iterate

After the fork:

  1. /plan-ceo-review — structured product review
  2. /plan-eng-review — structured architecture review
  3. /orch — specs generated and agents launched

Steps 1-3 are guided conversations. You’re making decisions, not doing busywork. The translation from “reviewed plan” to “running agents” is automated.

The quality improvement is downstream of the process improvement. When spec generation was manual, I’d cut corners. I’d write vague PM specs because I was tired of writing specs. The engineer would get a detailed spec and the PM would get “check in periodically and coordinate.” Now all three specs get the same level of detail because it’s generated from the same plan with the same codebase analysis.

The pattern: planning tools and execution tools are different concerns

gstack is a planning tool. It’s great at structured thinking, review, and decision-making. Orch is an execution tool. It’s great at running parallel agents with defined roles and communication channels.

Trying to make one tool do both would have been worse. gstack’s skill architecture is designed for single-session interactive workflows — you talk to Claude, it guides you through a structured process. Orch is designed for headless parallel execution — agents run autonomously with minimal human interaction.

The fork is just the glue. It detects when planning is done and execution should begin, generates the translation layer (specs), and hands off. Each tool stays focused on what it does well.

This is the same separation of concerns that makes the PM/engineer/reviewer agent pattern work. The PM doesn’t write code. The reviewer doesn’t merge. The planning tool doesn’t execute. The execution tool doesn’t plan. The boundaries are where the leverage is.

What’s next

The auto-sync workflow is the piece I’m most interested in generalizing. The “opinionated fork that stays current with upstream via AI-assisted merge resolution” pattern applies to a lot of situations beyond gstack. Any time you want to maintain local modifications to an actively developed upstream — internal forks of open-source tools, customized framework templates, adapted starter kits — the same approach works.

For orch and gstack specifically, the next step is tighter feedback loops. Right now, /orch launches agents and you monitor via orch dash. I’d like the gstack /qa and /design-review skills to be orch-aware too — automatically spinning up a QA agent after the engineer commits, or launching a design review agent when frontend work is detected. The planning-to-execution bridge works; now it’s about closing the execution-to-validation loop.

github.com/jeffdhooton/gstack · github.com/jeffdhooton/orch