glassbox — driving a real iPhone with a Luckfox PicoKVM
I want to share an open-source project that uses the Luckfox PicoKVM as the core of an iOS automation rig: glassbox.
What it is
glassbox is an iOS-first computer-use runtime: it looks at a phone's screen and drives it in an observe → decide → act → verify loop, the way a person would. The twist is that it's completely out-of-band — nothing is installed on the iPhone. The PicoKVM is the eyes and the hands:
Eyes: the iPhone's HDMI output is captured by the PicoKVM and served as an H.264 stream (GET /video/stream). glassbox decodes those frames and runs OCR / optional VLM perception on them.
Hands: glassbox sends actions over the PicoKVM's JSON-RPC API (POST /api/rpc); the PicoKVM presents itself to the iPhone as a USB HID mouse + keyboard and injects the pointer/key events.
The whole chain is one tidy loop: iPhone → USB-C Digital AV Multiport Adapter → PicoKVM → Mac. A single adapter carries HDMI out (video), a USB-A host port (HID in), and USB-C power — and the Mac only ever talks to the PicoKVM over plain HTTP.
Why this approach (and why the PicoKVM is perfect for it)
Most iOS automation needs WebDriverAgent/Appium/XCUITest, a jailbreak, a provisioning profile, or app instrumentation — all of which modify the device and can be detected. glassbox changes the phone the least:
No app or test runner installed, no jailbreak, no provisioning profile, no code injection.
The only on-device setup is built-in iOS accessibility (AssistiveTouch + Full Keyboard Access).
The whole controller runs on the Mac.
The payoff is high-fidelity observation and control with no test-harness artifacts and no anti-tamper / jailbreak-detection surface — and the PicoKVM is exactly the cheap, compact box that provides the two things this needs: HDMI capture + USB HID.
A couple of PicoKVM-specific findings
We drive an absolute HID pointer (logical max 32767) and map it to decoded frame pixels with a calibrated linear fit.
The RPC accepts wheelReport, but iOS didn't consume it in our bring-up, so we scroll with pointer drags instead.
iOS must have AssistiveTouch / external pointer enabled, since the PicoKVM is a HID pointer, not a touch digitizer.
Status
Open source (MIT). The end-to-end PicoKVM path is calibrated on an iPhone 17 Pro Max (the geometry table covers the iPhone 15/16/17 families). There's a one-command onboarding run that drives the iOS Settings app read-only to validate a fresh rig.
Repo: https://github.com/yoyicue/glassbox
I'd love feedback from this community — especially on PicoKVM firmware quirks, video-latency tuning, and HID timing. Happy to answer questions about the wiring or the RPC path.
这是什么
glassbox 是一个 iOS 优先的 computer-use(屏幕操控)运行时:它"看"手机屏幕,并以 观察 → 决策 → 执行 → 验证 的闭环像人一样操作设备。它的特别之处在于完全带外(out-of-band)——手机上不装任何东西。PicoKVM 就是它的"眼睛"和"手":
眼睛: iPhone 的 HDMI 输出被 PicoKVM 采集,并以 H.264 流的形式提供(GET /video/stream)。glassbox 解码这些帧,做 OCR / 可选的 VLM 感知。
手: glassbox 通过 PicoKVM 的 JSON-RPC 接口(POST /api/rpc)下发动作;PicoKVM 对 iPhone 表现为一个 USB HID 鼠标 + 键盘,注入指针/按键事件。
整条链路很简洁:iPhone → USB-C 数字影音多端口转换器 → PicoKVM → Mac。一个转换器同时承载 HDMI 输出(视频)、USB-A 主机口(HID 输入)和 USB-C 供电;Mac 全程只通过普通 HTTP 与 PicoKVM 通信。
为什么这么做(以及为什么 PicoKVM 很合适)
大多数 iOS 自动化要靠 WebDriverAgent/Appium/XCUITest、越狱、描述文件或对 App 插桩——这些都会改动设备、且可能被检测到。glassbox 对手机的改动最小:
不装 App / 测试运行器,不越狱,不需要描述文件,不做代码注入。
设备端只需开启内置的 iOS 辅助功能(辅助触控 + 全键盘访问)。
整个控制端都跑在 Mac 上。
好处是:高保真的观察与操控、没有测试框架痕迹、不触发反篡改/越狱检测面——而 PicoKVM 正是这种"便宜、小巧"的盒子,恰好提供了所需的两样东西:HDMI 采集 + USB HID。
几个和 PicoKVM 相关的发现
我们用绝对坐标的 HID 指针(logical 上限 32767),通过标定的线性拟合映射到解码后的帧像素。
RPC 能接受 wheelReport,但我们调试时发现 iOS 并不消费它,所以滚动改用指针拖拽实现。
iOS 必须开启辅助触控 / 外部指针,因为 PicoKVM 是 HID 指针,而非触摸数字化仪。
项目状态
开源(MIT)。端到端的 PicoKVM 通路在 iPhone 17 Pro Max 上完成了标定(几何表覆盖 iPhone 15/16/17 系列)。提供一条命令的上手流程,会只读遍历 iOS设置App,用来验证新搭好的设备链路。
仓库:https://github.com/yoyicue/glassbox
非常希望听到社区的反馈——尤其是 PicoKVM 固件的坑、视频延迟调优、以及 HID 时序方面。关于接线或 RPC 通路的问题我也很乐意解答。