Replay & E2E Testing
Agents use refs for exploration and authoring. Replay scripts are deterministic runs that can be used for E2E testing.
Core model
Two-pass workflow:
- Agent pass: discover and interact with refs (
snapshot->click @e../fill @e..). - Deterministic pass: run recorded
.adscript withreplay.
Record a replay script
Enable recording during a session:
By default, on close, a replay script is written to:
You can also provide a custom output file path:
--save-scriptvalue is treated as a file path.- Parent directories are created automatically when they do not exist.
- For ambiguous bare values, use
--save-script=workflow.ador a path-like value such as./workflow.ad.
Run replay
- Replay reads
.adscripts.
Run Maestro compatibility flows
Agent Device can run a supported subset of Maestro YAML through the replay runtime:
Maestro compatibility translates supported YAML commands into Agent Device replay actions. It is intended for common mobile flows, not full Maestro parity. Unsupported Maestro syntax fails loudly with the command or field name and a line number when available. If a missing command matters for your flows, use the compatibility tracker to check current support and share demand:
- Supported and unsupported capabilities: https://github.com/callstackincubator/agent-device/issues/558
- New focused compatibility request: https://github.com/callstackincubator/agent-device/issues/new
Currently supported areas include app launch with Apple-platform launch arguments and Android/iOS simulator clearState, runFlow file/inline with when.platform, when.visible, when.notVisible, and limited when.true boolean/platform expressions, onFlowStart and onFlowComplete hooks, deterministic repeat.times, tapOn including optional, index, childOf, label, and absolute/percentage point taps, doubleTapOn and longPressOn, inputText, focused-field eraseText, and pasteText, openLink, visibility assertions and extendedWaitUntil, scroll and scrollUntilVisible, absolute/percentage swipe and swipe.label, screenshots, keyboard dismiss, basic pressKey, back, animation waits, and stopApp, and ordered trusted runScript file/env scripts with http.post, json, and output variables. runScript is supported only as an ordered Maestro compatibility step for trusted file/env scripts; it can make network requests, and is not a native .ad command or security sandbox. Script execution uses Node vm only for compatibility isolation, not for security; the script timeout bounds synchronous execution, while http.post requests are bounded by the helper process timeout. Output keys cannot contain . because exported variables are addressed as output.<key>.
Maestro env values use the same replay precedence as .ad files: flow env is the default, shell AD_VAR_* values override it, and CLI -e KEY=VALUE wins over both.
Unsupported Maestro features such as repeat.while, full expression predicates beyond boolean literals and maestro.platform comparisons, evalScript, device utility commands, Android app launch arguments, and Android app state reset are tracked separately because they require neutral Agent Device runtime or device capabilities before they can be mapped safely.
Run a lightweight .ad suite
testdiscovers.adfiles from files, directories, or globs and runs them serially.context platform=...inside each.adfile is the target source of truth for suite execution.--platformis a filter for suite discovery; files without platform metadata are skipped when a filter is present.context timeout=...andcontext retries=...can be declared per script; CLI flags override metadata. Retries are capped at3, and duplicate keys in the context header fail fast instead of silently overriding each other.- By default, suite artifacts are written under
.agent-device/test-artifacts/<run-id>/.... Each attempt writesreplay.ad,result.txt, andreplay-timing.ndjson. Failed attempts also keep copied logs and artifact files when the replay produced them. replay-timing.ndjsonrecords attempt, cleanup, and per-step start/stop events with durations. Upload it from CI even for passing runs when comparing local and CI performance.- Timeouts are cooperative: the runner marks the attempt failed at the timeout boundary, then gives the underlying replay a short grace period to stop before session cleanup.
- The default text reporter streams one-line
pass,fail, orskipprogress on stderr as each suite entry finishes or retries. Each line includes current/total suite position and elapsed seconds such aspass 3/6 ... duration=12.34s, then the final summary prints failed tests and passed-on-retry flaky tests; use--verboseto print every final result. - When
--fail-fastand retries are both set, the current test still consumes its retries before the suite stops.
Parametrise .ad scripts
Substitute ${VAR} tokens in .ad scripts using values from the CLI, shell env, script-local env directives, or built-ins.
Precedence
Built-ins
Built-ins are provided by replay/test runtime and use the reserved AD_* namespace.
AD_PLATFORM- matchescontext platform=...or the selected platform when availableAD_SESSION- active session nameAD_FILENAME- path of the running.adfileAD_DEVICE- device identifier (when--deviceis set)AD_ARTIFACTS- attempt artifacts directory (when running undertest)
User-defined keys starting with AD_ are rejected in env, -e, and shell imports such as AD_VAR_AD_FOO, so built-ins cannot be overridden.
Substitution happens inside parsed string values. It does not create extra arguments, so quote selectors or text values that contain spaces:
Fallback and escape
${VAR:-default} yields default when VAR is unset.
\${APP} emits a literal ${APP} with no substitution.
Recipes
Run one flow against two app variants in CI:
Tune timings locally without editing the script:
Extract a reusable selector. Before:
After:
Quote ${VAR} inside selector expressions so the whole expression is treated as a single argument.
Notes
replay -udoes not yet preserveenvdirectives or${VAR}tokens. Workaround: temporarily inline the literal values, run-u, re-parametrise.- Shell env (
AD_VAR_*) is collected on the CLI/client side at request time, so the same values are seen whether the daemon runs locally or remotely. - No nested fallback.
${A:-${B}}is not supported. - Unresolved
${VAR}fails with afile:linereference. Typos are loud.
Update stale selectors in replay scripts
When a replay step fails, update can:
- Take a fresh snapshot.
- Resolve a stable replacement target.
- Retry the step.
- Rewrite the failing line in the same
.adfile.
Current update targets:
clickfillgetiswait
replay -u before/after examples
Example 1: stale selector rewritten in place
Example 2: stale ref-based action upgraded to selector form
Use replay -u locally during maintenance, review the rewritten .ad lines, then commit the updated script.
Troubleshooting
- Replay fails after UI/layout changes:
- Run
replay -ulocally and review the rewritten lines.
- Run
- Updating cannot resolve a unique target:
- Re-record that flow (
--save-script) from a fresh exploratory pass.
- Re-record that flow (
- Replay file parse error:
- Validate quoting in
.adlines (unclosed quotes are rejected).
- Validate quoting in
- Maestro compatibility flow fails on unsupported syntax:
- Check the linked command or field in https://github.com/callstackincubator/agent-device/issues/558. If it is important to your suite, comment there or open a focused issue with a small flow snippet.
