Page 1 of 1

glassbox — driving a real iPhone with a Luckfox PicoKVM

Posted: 2026-05-22 4:08
by yoyicue

I want to share an open-source project that uses the Luckfox PicoKVM as the core of an iOS automation rig: glassbox.

What it is

glassbox is an iOS-first computer-use runtime: it looks at a phone's screen and drives it in an observe → decide → act → verify loop, the way a person would. The twist is that it's completely out-of-band — nothing is installed on the iPhone. The PicoKVM is the eyes and the hands:

  • Eyes: the iPhone's HDMI output is captured by the PicoKVM and served as an H.264 stream (GET /video/stream). glassbox decodes those frames and runs OCR / optional VLM perception on them.

  • Hands: glassbox sends actions over the PicoKVM's JSON-RPC API (POST /api/rpc); the PicoKVM presents itself to the iPhone as a USB HID mouse + keyboard and injects the pointer/key events.

The whole chain is one tidy loop: iPhone → USB-C Digital AV Multiport Adapter → PicoKVM → Mac. A single adapter carries HDMI out (video), a USB-A host port (HID in), and USB-C power — and the Mac only ever talks to the PicoKVM over plain HTTP.

Why this approach (and why the PicoKVM is perfect for it)
Most iOS automation needs WebDriverAgent/Appium/XCUITest, a jailbreak, a provisioning profile, or app instrumentation — all of which modify the device and can be detected. glassbox changes the phone the least:

  • No app or test runner installed, no jailbreak, no provisioning profile, no code injection.

  • The only on-device setup is built-in iOS accessibility (AssistiveTouch + Full Keyboard Access).

  • The whole controller runs on the Mac.

The payoff is high-fidelity observation and control with no test-harness artifacts and no anti-tamper / jailbreak-detection surface — and the PicoKVM is exactly the cheap, compact box that provides the two things this needs: HDMI capture + USB HID.

A couple of PicoKVM-specific findings

  • We drive an absolute HID pointer (logical max 32767) and map it to decoded frame pixels with a calibrated linear fit.

  • The RPC accepts wheelReport, but iOS didn't consume it in our bring-up, so we scroll with pointer drags instead.

  • iOS must have AssistiveTouch / external pointer enabled, since the PicoKVM is a HID pointer, not a touch digitizer.

Status
Open source (MIT). The end-to-end PicoKVM path is calibrated on an iPhone 17 Pro Max (the geometry table covers the iPhone 15/16/17 families). There's a one-command onboarding run that drives the iOS Settings app read-only to validate a fresh rig.

👉 Repo: https://github.com/yoyicue/glassbox

I'd love feedback from this community — especially on PicoKVM firmware quirks, video-latency tuning, and HID timing. Happy to answer questions about the wiring or the RPC path.

这是什么

glassbox 是一个 iOS 优先的 computer-use(屏幕操控)运行时:它"看"手机屏幕,并以 观察 → 决策 → 执行 → 验证 的闭环像人一样操作设备。它的特别之处在于完全带外(out-of-band)——手机上不装任何东西。PicoKVM 就是它的"眼睛"和"手":

  • 眼睛: iPhone 的 HDMI 输出被 PicoKVM 采集,并以 H.264 流的形式提供(GET /video/stream)。glassbox 解码这些帧,做 OCR / 可选的 VLM 感知。

  • 手: glassbox 通过 PicoKVM 的 JSON-RPC 接口(POST /api/rpc)下发动作;PicoKVM 对 iPhone 表现为一个 USB HID 鼠标 + 键盘,注入指针/按键事件。

整条链路很简洁:iPhone → USB-C 数字影音多端口转换器 → PicoKVM → Mac。一个转换器同时承载 HDMI 输出(视频)、USB-A 主机口(HID 输入)和 USB-C 供电;Mac 全程只通过普通 HTTP 与 PicoKVM 通信。

为什么这么做(以及为什么 PicoKVM 很合适)
大多数 iOS 自动化要靠 WebDriverAgent/Appium/XCUITest、越狱、描述文件或对 App 插桩——这些都会改动设备、且可能被检测到。glassbox 对手机的改动最小:

  • 不装 App / 测试运行器,不越狱,不需要描述文件,不做代码注入。

  • 设备端只需开启内置的 iOS 辅助功能(辅助触控 + 全键盘访问)。

  • 整个控制端都跑在 Mac 上。

好处是:高保真的观察与操控、没有测试框架痕迹、不触发反篡改/越狱检测面——而 PicoKVM 正是这种"便宜、小巧"的盒子,恰好提供了所需的两样东西:HDMI 采集 + USB HID。

几个和 PicoKVM 相关的发现

  • 我们用绝对坐标的 HID 指针(logical 上限 32767),通过标定的线性拟合映射到解码后的帧像素。

  • RPC 能接受 wheelReport,但我们调试时发现 iOS 并不消费它,所以滚动改用指针拖拽实现。

  • iOS 必须开启辅助触控 / 外部指针,因为 PicoKVM 是 HID 指针,而非触摸数字化仪。

项目状态

开源(MIT)。端到端的 PicoKVM 通路在 iPhone 17 Pro Max 上完成了标定(几何表覆盖 iPhone 15/16/17 系列)。提供一条命令的上手流程,会只读遍历 iOS设置App,用来验证新搭好的设备链路。

👉 仓库:https://github.com/yoyicue/glassbox

非常希望听到社区的反馈——尤其是 PicoKVM 固件的坑、视频延迟调优、以及 HID 时序方面。关于接线或 RPC 通路的问题我也很乐意解答。