Build an AI QA Agent for Expo Apps with EAS Workflows in minutes today

DevelopmentReact Native9 minutes read

Michał Pierzchała

Michał Pierzchała

Guest Author

Build a lightweight mobile QA agent with EAS Workflows and agent-device. Automate UI verification for iOS and Android PRs without a giant test framework.

Build an AI QA Agent for Expo Apps with EAS Workflows in minutes today

This is a guest post from Michał Pierzchała - Principal Engineer at Callstack's R&D Incubator; created agent-device, React Native Testing Library.

AI agents are great at producing a lot of code, quickly. More code means more PRs targeting our codebase, and even more demand for a code quality assurance process that works and scales together with agents generating it.

While AI agents contributing to backend codebases will often do well with integration tests, things are not so bright on the frontend side. When it comes to generating code that produces UI, such as mobile iOS and Android apps, the latest models will often do amazingly well without even checking the results (thank you React for declarative UI that makes this easier). But many times they’ll just miss the mark. Imagine a hardcore React Native developer that can only read code, without access to a mobile device. How certain you can be they’ll nail the job just by looking at the code? Hint: not much. Verification is key.

This is the gap we want to close for our Expo apps. And in this article, I’ll show you how to do it using existing tools at no extra cost. Let’s dive in!

With EAS Workflows, you can already reuse builds, run custom jobs on Android and iOS, and comment on GitHub pull requests. That turns out to be enough to build a lightweight QA agent today, without introducing a big custom platform.

I put together a minimal template here: callstackincubator/eas-agent-device.

The setup is intentionally small:

  • Expo app with CNG
  • EAS Workflows for orchestration
  • a tiny Node.js QA agent using AI SDK
  • agent-device for Android and iOS automation
  • one GitHub comment with Android and iOS QA results

The important part is not the “AI” label. It’s that you can start with a working baseline in minutes, then expand it as your team needs more coverage.

The goal

For every pull request, we want to:

  1. reuse an existing mobile release build with latest JS when possible
  2. boot an emulator or simulator
  3. install and launch the app
  4. let an agent inspect the UI and take screenshots
  5. post a short QA summary under the PR

That’s it. Not a giant test framework. Not a replacement for all E2E tests. Just a practical QA loop around mobile UI changes.

Why EAS Workflows fits this really well

The main reason is that EAS Workflows already understands mobile-specific CI.

What you get easily:

  • fingerprint to detect native changes
  • get-build to find reusable builds
  • repack to avoid rebuilding native code when only JS changed
  • linux and macos runners with virtualization that run Android and iOS devices
  • github-comment to send the result back to the PR

So instead of forcing mobile automation into a generic CI system, you can keep the whole pipeline where the mobile pieces already exist. That makes the setup much easier to reason about.

I’ll note here we’ll need a linux-medium-nested-virtualization image for Android job to be able to open Android Emulators, which we’ll need to install from scratch, and macos-medium image (or larger) for iOS Simulators, which are already available. I must say I was positively surprised with the flexibility of the machines and to what extent I can script them.

Start from this workflow shape

The core workflow is simple and fits on a single screen:

Code
jobs:
fingerprint:
type: fingerprint
android_get_build:
type: get-build
params:
platform: android
profile: qa-release
android_repack:
type: repack
android_build:
type: build
qa_android:
runs_on: linux-medium-nested-virtualization
steps:
- uses: eas/checkout
- uses: eas/install_node_modules
- uses: eas/download_build
- id: provision_android_emulator
run: bash ./scripts/agent-qa/provision-android-emulator.sh
- id: run_agent_qa
run: bash ./scripts/agent-qa/run-and-export.sh "${{ steps.download_build.outputs.artifact_path }}"
env:
AGENT_DEVICE_SESSION: qa-android
AGENT_DEVICE_PLATFORM: android
qa_comment:
type: github-comment

For iOS, it’s the same idea, just on a macOS worker with a simulator build. The full working version, with Android and iOS running in parallel, is in the repo: .eas/workflows/agent-qa-mobile.yml

The key design choice: split bootstrap from QA

This is the one thing I’d strongly recommend.

At first glance, you might want the agent to do everything: install the app, open it, navigate, inspect, report. In practice, that makes the system more fragile than it needs to be. Agents can use our tools incorrectly, chose not to read the instructions we asked them to use, or hallucinate flags for CLIs they’re using (been there).

So the key to make our AI agent work for us reliably, not only at times, is to keep the workflow deterministic as much as it’s possible. And with agent-device we can script these pretty easily to always provide correct bootstrap parameters when installing and opening the app on a device:

Code
#!/usr/bin/env bash
# Phase 1: deterministic bootstrap
agent-device install "${APP_ID}" "${APP_PATH}"
agent-device open "${APP_ID}" --relaunch

agent-device knows which platform to run thanks to AGENT_DEVICE_PLATFORM environment variable set in our job.

The other part of QA process that’s harder to script, can stay agent-driven. The agent will infer acceptance criteria from the PR, inspect the UI in a token-efficient way through accessibility tree, navigate a little, take screenshots, and summarize what happene

Code
# Phase 2: variable agent-driven flow
npm run agent-qa

From my own experience building various agents over past month, that split makes the workflow much more reliable. That means the agent never has to guess artifact paths or install commands.

The agent can stay very small

Once the app is already running, the agent only needs a few tools:

  • read PR context
  • load the agent-device skill
  • run UI actions like snapshot, press, screenshot through agent-device
  • write a final report

A simplified version looks like this:

Code
import { ToolLoopAgent } from 'ai';
const agent = new ToolLoopAgent({
model: 'openai/gpt-5.4-mini',
instructions: `
You are a mobile QA agent running inside EAS Workflows.
Treat the app as a black box.
Infer acceptance criteria from the PR.
The app is already installed and launched.
Use agent-device to inspect the UI, navigate, take screenshots, and write a report.
If the result is visually plausible but not fully confirmed from structured UI output, use "unsure".
You must call write_report exactly once.
`,
tools: {
get_pr_context,
load_skill,
read_skill_file,
agent_device,
write_report,
},
});

That’s enough to get us started fast and iterate to greatness. I’m using Vercel’s AI SDK as it provides good balance of control vs batteries-included tools. On top of that I can access all the models I like through their AI Gateway service (with no markup, at least yet) or if you don’t want another subscription, you can use your good old OPENAI_API_KEY and use openai provider instead. Applies to providers too, not only OpenAI.

The full agent is here: scripts/agent-qa/index.ts - I tried to keep it brief.

Report only what’s necessary

Keep the output small and useful. In our template, each platform produces verification status, one of: passed, failed ,blocked ,unsure. And then a short section with a summary, checks performed, issues found, screenshots, and a full JSON report in a collapsible block (for debugging failed QA verification mostly).

That last status, unsure, is important. Mobile UI is not always easy to verify from structured automation output alone. Sometimes accessibility trees help a lot. Sometimes the screenshot is the strongest evidence. If the agent cannot prove the result cleanly, it should say so and attach the image.

That is much better than pretending it knows.

Final step: a comment under Pull Request

Like with human verification, a short comment under our work is often time all we need to get a good overview of what agent was able to verify (or not), together with a visual confirmation that we often need so much to truly assess the change is not breaking our app. A single PR comment works really well:

Code
## Agent QA
| Platform | Status |
| -------- | --------- |
| Android | ✅ passed |
| iOS | 🤔 unsure |
### Android
Short summary...
Screenshots
### iOS
Short summary...
Screenshots

That puts the result exactly where reviewers already are, with necessary information and visual feedback at hand.

A comment from our QA Agent, “seeing” our app!

💡There’s no official API to upload screenshots to GitHub comments, so instead in our example we used Vercel Blob as a 3rd party cloud to store images. Replace it with the solution that works for your use case, e.g. AWS S3.

What I’d recommend if you want to try this today

  1. Connect your GitHub project to EAS Workflows from Expo dashboard
  2. Start with one platform first, e.g. Android.
  3. Use CNG and keep native build reuse enabled.
  4. Keep your QA build profiles separate from production.
  5. Make bootstrap deterministic.
  6. Keep the agent black-box only.
  7. Post one PR comment, not ten different artifacts.
  8. Add screenshots early. They help a lot.

Then, once that works, extend it:

  • add iOS
  • upload screenshots to Blob storage
  • add better selectors
  • turn successful exploratory checks into more deterministic flows

The main takeaway

You do not need a huge AI testing platform to get useful mobile QA automation.

Expo together with EAS Workflows already gives you most of the infrastructure:

  • build reuse: for fast iteration and repack for up-to-date JS bundle
  • mobile CI workers: with scripting and virtualization, so simulators and emulators are available
  • workflow orchestration: to put this all together
  • GitHub integration: to reduce cognitive load and keep verification inside Pull Requests

From there, a custom QA agent can stay surprisingly small. And thanks to TypeScript-based AI SDK, you can bend it to all your needs (and it’s pretty flexible).

That’s why I like this setup: it’s something you can build today, in minutes, starting from a simple template, and grow only when you actually need more.

Start here: callstackincubator/eas-agent-device

QA
AI

React Native CI/CD for Android, iOS, and Web

Learn more