building in public · early access soon

See the webthe way humans do.

A perception layer that gives your agent human-level web understanding. DOM, accessibility, vision, and time — fused into a single scene graph. Ships as an MCP server.

no spam · unsubscribe any time

§01 · the problem

Agents can read HTML.
They can’t see the page.

Modern web apps hide meaning in CSS, layout, animation, and canvas rendering. An agent with only DOM access is blind to what a human instantly understands — which button is primary, what state a form is in, whether the spinner means loading or frozen.

This is why agents that work great in demos fall apart on real websites.

captured on uipe.dev
h1 > span: "the way humans do."
A naive extractor grabs the inner span and loses the outer clause. The agent thinks the headline is four words. It isn’t.

Without UIPE

dom only
structural.json
section.relative
header
nav4 links, no context
div.grid
div.flex.flex-col
divempty wrapper
h1
span"See the web"
span"the way humans do."
p"A perception layer that gives your…"
form
inputtype=email, placeholder=you@…
button"Join the waitlist"
div.hidden.lg:blockcanvas — opaque

With UIPE

dom + a11y + vision + time
scene_graph.json
sceneviewport=1440x900
regionrole=bannertop navigation — 4 anchor links
regionrole=hero
headingrole=heading[1]"See the web the way humans do."primary, 72px, salience=0.98
text"A perception layer that gives your agent human-level web understanding…"
formrole=formwaitlist capture — 1 input, 1 submit
textboxemail · requiredrole=textbox
buttonrole=button"Join the waitlist"primary CTA · violet glow · salience=0.92
regionrole=decoration3D scene graph — animated, non-interactive

these trees are the actual output shape. not a mockup.

§02 · the solution

A unified scene graph
your agent can actually reason about.

§02·01

Human-level understanding.

Not just HTML. The page the way a human sees it — visual hierarchy, state, motion, meaning. Every node carries role, affordance, salience, and confirmed text.

scene_graph.json
headingrole=heading[1]salience=0.98
buttonrole=buttonprimary · violet glow
formrole=form1 input · 1 submit
textboxrole=textboxemail · required
regionrole=decorationnon-interactive
§02·02

MCP-native.

One line in your agent's config. Works with Claude Code, Cursor, Zed — anything that speaks MCP. No adapters, no wrappers, no bespoke plumbing.

mcp.config.json
"mcpServers"{
"uipe"{
"url""https://mcpaasta.uipe.dev/mcp"
"transport""sse"
}
}
// 12 tools exposed · one config line
§02·03

Temporal awareness.

Knows what just happened. Did the click work? Did the page load? Did that spinner stop? Frame-by-frame diffs turn guesswork into ground truth.

frame_delta.log
t+0msclick(button.primary)
t+127msurl_changed/login → /dashboard
t+841msloadedrole=main appeared
t+2.1ssettledno layout shift

exposed as 12 mcp tools · drop-in for claude code, cursor, zed.

§03 · how it works

Four signals,
fused into one scene graph.

UIPE captures what a browser renders and what a user experiences — then merges them. The result is a single representation your agent can reason about in one pass.

DOM
html > body > main > header > h1 · 847 nodes
Accessibility tree
role=heading level=1 · "See the web…"
Vision
primary_cta@(x:234, y:512) conf=0.94
Time
frame_delta_847ms · state_changed
scene graph
12 mcp tools · one representation
dom
DOM

Structural markup — tags, attributes, text nodes, containment.

a11y
Accessibility tree

Semantic roles and states derived from ARIA and heuristics.

vision
Vision

Pixel truth — a screenshot parsed by a local vision model.

time
Time

Frame-to-frame deltas that reveal what just changed.

no single signal is enough. the fusion is the product.

§04 · pricing

Pay what a coffee costs.
Get a month of agent evaluations.

Transparent per-session pricing. No seats, no minimums, no retainer. Spin up an agent, use what you need, stop when you're done.

tier · free
Free
$0GitHub OAuth

Hack, explore, prototype. Same engine, quota-metered.

  • 10 Scan sessions / day
  • 2 Deep sessions / day
  • GitHub account, 30 days old
  • Community support
tier · scan
Scan
$0.03per session

Structural + a11y + fast vision. The everyday workhorse.

  • DOM + a11y + OmniParser
  • Typical agent eval loop
  • Sub-second for most pages
  • Pay-as-you-go via x402
tier · deep
Deep
$0.12per session

Scan plus frontier vision and temporal analysis.

  • Everything in Scan
  • Frontier vision pass
  • Frame-by-frame temporal diff
  • Pay-as-you-go via x402

paid tiers settled via x402 · usdc on base

Free tier requires a GitHub account that’s at least 30 days old. Soft limits; we’ll contact you before anything hard stops.

§05 · integrate

Drop it into your agent.
One config block, no SDK.

UIPE ships as a remote MCP server. If your agent speaks the Model Context Protocol, it speaks UIPE.

built for agent-native editors
  • Claude Code
  • Cursor
  • Zed
  • Windsurf
early access

no spam · unsubscribe any time

.mcp.json · claude · cursor · zed
{
  "mcpServers": {
    "uipe": {
      "url": "https://mcpaasta.uipe.dev/mcp",
      "transport": "sse"
    }
  }
}

coming soon · join the waitlist for early access