Building expo-agent-bridge: Giving AI Agents Eyes and Hands in React Native
AI agents can write mobile code but can't see the screen, check navigation state, or observe network traffic. I built an MCP server and Expo dev plugin that fixes that, including discovering that iOS Simulator has no tap API and working around it with AppleScript CGEvents.
Every AI pipeline I build with mobile eventually hits the same wall. The agent writes code, the code might be correct, but the agent has no way to verify. It can’t see the simulator screen. It can’t check whether the navigation went to the right route. It can’t inspect whether the auth store updated after login. It can’t see the 401 that came back from the API. So you become the eyes. You’re checking the simulator, describing what you see, and the agent adjusts based on your description.
This works fine for web development. Agents can open a browser, inspect the DOM, take screenshots, observe network traffic. The feedback loop is tight. Mobile development doesn’t have that. A running React Native app on iOS Simulator is a graphical window that the agent has no interface to. It’s a write-and-pray loop.
I built expo-agent-bridge to close that gap. It’s an MCP server and Expo dev plugin that gives agents runtime visibility into React Native apps. Screenshots, tap automation, navigation state, store inspection, network logs, console output, route trees, all exposed as MCP tools that any compatible agent can call.
Expo offers their own paid MCP that covers screenshots, taps, and log collection. It’s fine. But it doesn’t have deeper state inspection, and I wanted something I owned, could extend, and could open-source. So I built one.
The architecture in one diagram
Agent ←── MCP (stdio) ──→ Server ←── simctl ──→ iOS Simulator
←──── WS ────→ Dev Plugin (in app)
Two packages in a monorepo. The server (@expo-agent-bridge/server) runs on the developer’s machine as an MCP server over stdio transport. It talks to the simulator via xcrun simctl for screenshots and device management, and runs a WebSocket server for talking to the dev plugin. The plugin (@expo-agent-bridge/plugin) is a headless React component that you drop into your Expo app’s root layout. It connects to the server as a WebSocket client, patches fetch and console.*, hooks into the navigation container, and answers state queries.
Some tools work without the plugin: screenshot, tap, list_devices, open_devtools, expo_install. These just talk to simctl or the local environment. The deeper tools (get_nav_state, get_store_state, get_network_log, get_logs, get_routes, find_view) need the plugin running inside the app.
The split is deliberate. An agent should be able to take a screenshot and tap buttons even before you’ve installed the plugin. The plugin adds the runtime introspection layer once you’re ready for it.
simctl has no tap command
The first surprise was that xcrun simctl has no native tap command. Unlike Android’s adb input tap x y, there’s no simctl equivalent. You can take screenshots, boot devices, install apps, open URLs, but you can’t simulate a touch.
I spent a while confirming this wasn’t a documentation gap. It’s not. Apple simply never added it. The Simulator’s touch input goes through the window system’s event handling, and simctl doesn’t have access to it.
The workaround is AppleScript with CGEvents:
-
Use AppleScript’s System Events to find the Simulator process, get its frontmost window, and walk the window’s UI element tree to find the device screen. The device screen is the largest
AXGroupchild of the window, which is the accessibility element that corresponds to the rendered device display. -
Read the AXGroup’s position and size. These are screen coordinates in macOS space, pixel values relative to the display.
-
Read the device’s logical screen dimensions via
xcrun simctl io <udid> enumerate. This gives you the LCD’s pixel size and UI scale factor, from which you derive the logical dimensions (e.g., 393×852 for iPhone 16). -
Calculate the mapping:
scaleX = axGroupWidth / deviceLogicalWidth,screenX = axGroupX + tapX * scaleX. Same for Y. -
Activate the Simulator window (so it receives input), then send CGEvent mouse-down and mouse-up at the calculated screen coordinates.
This handles any Simulator zoom level automatically. Whether the window is at 100%, 75%, or fit-to-screen, the AXGroup’s size reflects the actual screen area and the coordinate math stays correct.
async function getSimulatorScreenBounds(): Promise<ScreenBounds> {
const script = `
const se = Application("System Events");
const sim = se.processes.byName("Simulator");
const win = sim.windows[0];
const children = win.uiElements();
let screenGroup = null;
let maxArea = 0;
for (let i = 0; i < children.length; i++) {
try {
if (children[i].role() === "AXGroup") {
const sz = children[i].size();
const area = sz[0] * sz[1];
if (area > maxArea) {
maxArea = area;
screenGroup = children[i];
}
}
} catch(e) {}
}
const pos = screenGroup.position();
const sz = screenGroup.size();
JSON.stringify({ x: pos[0], y: pos[1], width: sz[0], height: sz[1] });
`;
const { stdout } = await execFileAsync("osascript", [
"-l", "JavaScript", "-e", script,
]);
return JSON.parse(stdout.trim());
}
The click itself uses JXA to bridge into CoreGraphics:
async function clickAtScreenCoordinates(screenX: number, screenY: number) {
const script = `
ObjC.import('CoreGraphics');
const point = $.CGPointMake(${screenX}, ${screenY});
const mouseDown = $.CGEventCreateMouseEvent(
null, $.kCGEventLeftMouseDown, point, $.kCGMouseButtonLeft
);
const mouseUp = $.CGEventCreateMouseEvent(
null, $.kCGEventLeftMouseUp, point, $.kCGMouseButtonLeft
);
$.CGEventPost($.kCGHIDEventTap, mouseDown);
delay(0.05);
$.CGEventPost($.kCGHIDEventTap, mouseUp);
`;
await execFileAsync("osascript", ["-l", "JavaScript", "-e", script]);
}
Your Mac will prompt for accessibility permissions the first time. Without them, the AXGroup lookup fails silently and nothing happens.
I’m documenting this in detail because every agent that wants to tap an iOS Simulator screen has to solve the same problem, and I couldn’t find it written down anywhere. If you’re building something similar and getting silent tap failures, check accessibility permissions first.
WebSocket direction: reversed from spec
The original spec said the plugin would run a WebSocket server inside the React Native app, and the MCP server would connect to it as a client. The reasoning was simple: the plugin knows when it’s ready, so the plugin should listen and the server should connect.
That design requires a WebSocket server running inside React Native. React Native’s built-in WebSocket is a client implementation only. Running a server requires a native module, something like react-native-tcp-socket or a custom Turbo Module. That’s a native dependency in what should be a zero-native-dependency dev plugin. It means pod installs, autolinking, possibly a dev client rebuild. For a debugging tool that should be one npm install and one JSX line.
I reversed the direction. The MCP server (Node.js, running on the host machine) runs the WebSocket server using the ws package. The plugin (inside the React Native app) connects as a WebSocket client using RN’s built-in WebSocket. Same protocol, same request/response pattern, zero native dependencies in the plugin.
The tradeoff is that the plugin has to know the server’s address. In practice this is always ws://localhost:19275 because both are on the same machine. The plugin attempts to connect on mount and reconnects every 3 seconds if the server isn’t running yet. This means you can start the app before the MCP server and it’ll connect whenever the server comes up. Order doesn’t matter.
MCP SDK v2: import paths and missing dependencies
The MCP server uses @modelcontextprotocol/server v2 (2.0.0-alpha.2), which has a different API surface from v1. Two gotchas worth documenting:
Import paths changed. In v1, StdioServerTransport was at @modelcontextprotocol/server/stdio. In v2, it’s exported from the main entry point alongside McpServer:
import { McpServer, StdioServerTransport } from "@modelcontextprotocol/server";
If you’re following older tutorials that import from subpaths, they’ll fail with ERR_PACKAGE_PATH_NOT_EXPORTED. The v2 alpha consolidates everything into the main export.
Missing transitive dependency. The SDK internally imports @cfworker/json-schema for JSON Schema validation but doesn’t declare it in its dependencies. On a fresh npm install, the server crashes with ERR_MODULE_NOT_FOUND: Cannot find package '@cfworker/json-schema'. The fix is to install it explicitly:
npm install @cfworker/json-schema
This is an alpha SDK bug. It’ll presumably get fixed before stable release, but if you’re building against v2 alpha today and getting a module-not-found error for a package you never heard of, this is why.
Metro and symlinked packages
When developing the plugin locally, I symlinked it into the consuming Expo app via npm link. Metro, React Native’s bundler, doesn’t follow symlinks out of the project root by default. The fix is a watchFolders entry in metro.config.js:
const pluginRoot = path.resolve(__dirname, "../expo-agent-bridge/packages/plugin");
config.watchFolders = [pluginRoot];
The other Metro gotcha: the plugin’s TypeScript uses extensionless imports (./AgentBridge, not ./AgentBridge.js). Node ESM with moduleResolution: "Node16" requires .js extensions. Metro with moduleResolution: "bundler" does not. Since the plugin is consumed by Metro (not Node directly), extensionless imports are correct. If you’re writing a package that needs to work with both Metro and Node, this is one of the few places where the right answer depends entirely on who’s consuming you.
The plugin: one component, five hooks
The plugin is a single headless React component. On mount it does five things:
-
Patches
fetch. WrapsglobalThis.fetchto capture every request and response: method, URL, status, timing, headers, and body previews (capped at 4KB). Keeps a rolling buffer of the last 200 requests. Basically a Network tab for the agent. -
Captures
console.*. Patchesconsole.log,.warn,.error,.info,.debugto copy output into a rolling buffer of 500 entries. The original functions still fire, so the agent’s log capture doesn’t suppress the developer’s own logging. -
Connects via WebSocket. Opens a connection to the server at
ws://localhost:{port}and sends aplugin_hellohandshake. Incoming messages are requests from the MCP server; outgoing messages are responses. Auto-reconnects every 3 seconds on disconnect. -
Reads navigation state. Uses
useNavigationStatefrom React Navigation to keep a ref to the current navigation tree. When the server asks forget_nav_state, the handler walks the state tree recursively to find the focused route, its params, and the full stack. -
Probes for Expo Router. On mount, dynamically imports
expo-routerand looks for_getRouteManifest. If found, theget_routestool can return the full route tree. If not (maybe the app uses bare React Navigation), the tool returns a helpful error instead of crashing.
All five tear down cleanly on unmount: fetch gets unpatched, console functions get restored, the WebSocket closes, the reconnect timer clears.
Store registration: explicit is better than magic
The plugin exposes a registerStore function that the developer calls for each store they want agents to see:
import { registerStore } from "@expo-agent-bridge/plugin";
import { useAuthStore } from "./useAuthStore";
if (__DEV__) {
registerStore("auth", useAuthStore);
}
I considered auto-discovering stores, scanning for Zustand hooks, walking React context providers. The problem is reliability. Zustand stores are just hooks, and there’s no runtime marker that distinguishes useAuthStore from useCustomHook. Redux stores could be found via __REDUX_DEVTOOLS_EXTENSION__, but that requires the devtools extension to be installed. And auto-discovery means the agent sees everything, including stores with sensitive data the developer might not want exposed.
Explicit registration is one line per store. The developer chooses what’s visible. The function handles Zustand (detects getState() on the hook), Redux (detects getState() on the object), and plain objects as a fallback. The get_store_state tool accepts an optional dot-notation path parameter for reaching into deeply nested state without dumping the entire tree.
Getting Claude to actually use the MCP tools
After building the server, registering it in Claude Code’s MCP settings, and verifying all 11 tools were available (I could see them in tools/list), I asked Claude to take a screenshot of the running app. It shelled out to xcrun simctl io booted screenshot instead of calling the screenshot MCP tool. I asked it to list available simulators. It ran xcrun simctl list devices. Same problem.
The tools were registered and available. Claude just preferred its own approach. It knows how to use simctl, so it does. MCP tools compete with the model’s training data about how to accomplish tasks, and xcrun simctl is well-documented enough that the model reaches for it by default.
The fix was behavioral, not technical. I added explicit instructions to the project’s CLAUDE.md:
## iOS Simulator (expo-agent-bridge MCP)
For ALL iOS Simulator interactions, use the `expo-agent-bridge` MCP tools —
do NOT shell out to `xcrun simctl` directly:
- `screenshot` — capture the simulator screen (returns inline image)
- `tap` — tap at coordinates or by testID
- `list_devices` — list available simulators
...
After that, Claude used the MCP tools consistently. If you’re building MCP servers for capabilities that overlap with things the model can already do via shell commands, you’ll probably hit the same thing. The model knows simctl. It’ll use simctl. You have to tell it not to.
What’s missing
The v1 is useful but has real gaps:
find_view is a stub. It returns layout bounds for a given testID but doesn’t traverse the React fiber tree for deep view hierarchy inspection. Full fiber walking requires either the React DevTools protocol or direct fiber tree access, both of which are nontrivial in a production React Native runtime. For now, agents can find a view and get its coordinates (useful for tapping by testID), but they can’t walk the component tree the way React DevTools can.
testID-based screenshot cropping isn’t implemented. The tool takes a testID parameter and the plan is: query plugin for view bounds, full screenshot, crop. The cropping step needs an image processing library like sharp, which I haven’t added yet. Full-screen screenshots work fine.
iOS only. The plugin code is platform-agnostic. Everything it does (WebSocket, fetch patching, console patching, navigation state) works on Android. The server is iOS-only because it depends on xcrun simctl and AppleScript. Adding Android means writing an adb adapter, which is a well-defined project but not one I’ve started.
No gesture support. The tap tool does single taps only. Swipe, long press, and pinch would need additional CGEvent sequences. The coordinate mapping infrastructure is already there, so adding gestures is mechanical.
The tool list
Eleven tools in the final build:
| Tool | Channel | What it does |
|---|---|---|
screenshot | simctl | Capture the simulator screen as a PNG |
tap | AppleScript | Tap at coordinates via CGEvent mouse clicks |
list_devices | simctl | List simulators with UDID, name, runtime, state |
open_devtools | local | Launch React Native DevTools in the browser |
expo_install | local | Install packages via npx expo install |
get_nav_state | plugin | Current route, params, and navigation stack |
get_store_state | plugin | Inspect registered Zustand/Redux stores |
get_network_log | plugin | Recent fetch requests with headers, bodies, timing |
get_logs | plugin | JavaScript console output |
get_routes | plugin | Full Expo Router route tree |
find_view | plugin | Find a view by testID and get its layout |
The first five work without the plugin. The last six require <AgentBridge /> running in the app.
Feedback loop
Before expo-agent-bridge, the cycle for mobile development with an AI agent was:
- Agent writes code
- Developer checks the simulator manually
- Developer describes what they see back to the agent
- Agent adjusts
After:
- Agent writes code
- Agent takes a screenshot to verify the UI
- Agent checks
get_nav_stateto confirm the right route loaded - Agent checks
get_network_logto see if the API call succeeded - Agent adjusts
Steps 2-4 happen without the developer doing anything. The agent can verify its own work.
Frequently Asked Questions
What is expo-agent-bridge?
expo-agent-bridge is an open-source MCP server and Expo dev plugin that gives AI coding agents (Claude Code, Cursor, Codex) runtime visibility into React Native/Expo apps running on iOS Simulator. It exposes 11 tools: screenshots, tap automation, device management, navigation state, store inspection, network logs, console output, route trees, view lookup, DevTools launching, and package installation.
Why can't AI agents already see iOS Simulator?
AI coding agents operate through terminal-based interfaces. They can read and write code, but they have no built-in way to interact with a graphical iOS Simulator window. They can't capture what's on screen, tap UI elements, or inspect the runtime state of a React Native app. The app is a black box. Agents write code and hope it works.
Does iOS Simulator have a tap command?
No. Unlike Android's adb which has input tap, xcrun simctl has no native tap command. expo-agent-bridge works around this by using AppleScript to find the Simulator window's screen element via the macOS accessibility hierarchy, calculating coordinate mapping between logical device points and screen pixels, and sending CGEvent mouse clicks.
How is this different from Expo's official MCP?
Expo offers a paid MCP ($99/year) that covers screenshots, taps, log collection, and a few other tools. expo-agent-bridge is free, open-source, and adds deeper runtime state inspection: navigation state, Zustand/Redux store contents, network request/response logging, and route tree inspection. The plugin-dependent tools give agents visibility into what the app is actually doing, not just what it looks like.
Does this work with Android?
Not yet. The server currently uses xcrun simctl and AppleScript, which are macOS/iOS only. The plugin-side code is platform-agnostic. Adding Android support means writing an adb adapter in the server package.