refactor: enforce device labels, unify screenshot, remove deprecated commands, session-id-less design

- Device labels: lowercase, no whitespace, only a-z 0-9 - _ (enforced at config time) - Session IDs removed: device label is the sole identifier - Routes changed: /sessions/:id → /devices/:label - Removed commands: click, type, find-window, wait-for-window, label, old version, server-version - Renamed: status → version (compares relay/remote.py/client commits) - Unified screenshot: takes 'screen' or a window label as argument - Windows listed with human-readable labels (same format as device labels) - Single instance enforcement via PID lock file - Removed input.rs (click/type functionality) - All docs and code in English - Protocol: Hello.label is now required (String, not Option<String>) - Client auto-migrates invalid labels on startup
2026-03-06 01:55:28 +01:00 · 2026-03-06 01:55:28 +01:00 · 0b4a6de8ae
commit 0b4a6de8ae
parent 5fd01a423d
14 changed files with 736 additions and 1180 deletions
--- a/README.md
+++ b/README.md
@ -33,7 +33,8 @@ helios-remote/
 ├── crates/
 │   ├── common/     # Shared protocol types, WebSocket message definitions
 │   ├── server/     # Relay server (REST API + WebSocket hub)
-│   └── client/     # Windows client — Phase 2 (stub only)
+│   └── client/     # Windows client
+├── remote.py       # CLI wrapper for the REST API
 ├── Cargo.toml      # Workspace root
 └── README.md
 ```
@ -46,44 +47,56 @@ AI Agent
   ▼
 helios-server  ──WebSocket──  helios-client (Windows)
   │                               │
-POST /sessions/:id/screenshot      │  Captures screen → base64 PNG
-POST /sessions/:id/exec            │  Runs command in persistent shell
-POST /sessions/:id/click           │  Simulates mouse click
-POST /sessions/:id/type            │  Types text
+POST /devices/:label/screenshot    │  Captures screen → base64 PNG
+POST /devices/:label/exec         │  Runs command in persistent shell
 ```

-1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` message.
-2. The **AI agent** calls the REST API to issue commands.
-3. The relay server forwards commands to the correct client session and streams back responses.
+1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` with its device label.
+2. The **AI agent** calls the REST API using the device label to issue commands.
+3. The relay server forwards commands to the correct client and streams back responses.
+
+### Device Labels
+
+Device labels are the **sole identifier** for connected clients. Labels must be:
+- **Lowercase** only
+- **No whitespace**
+- Only `a-z`, `0-9`, `-`, `_` as characters
+
+Labels are set during first-time client setup. Examples: `moritz_pc`, `work-desktop`, `gaming-rig`
+
+### Single Instance
+
+Only one helios-remote client can run per device. The client uses a PID-based lock file to enforce this.

 ## Server

 ### REST API

-All endpoints require the `X-Api-Key` header.
+All endpoints (except `/version` and `/ws`) require the `X-Api-Key` header.

 | Method | Path | Description |
 |---|---|---|
-| `GET` | `/sessions` | List all connected clients |
-| `POST` | `/sessions/:id/screenshot` | Request a screenshot (returns base64 PNG) |
-| `POST` | `/sessions/:id/exec` | Execute a shell command |
-| `POST` | `/sessions/:id/click` | Simulate a mouse click |
-| `POST` | `/sessions/:id/type` | Type text |
-| `POST` | `/sessions/:id/label` | Rename a session |
-| `GET` | `/sessions/:id/windows` | List all windows |
-| `POST` | `/sessions/:id/windows/minimize-all` | Minimize all windows |
-| `POST` | `/sessions/:id/windows/:window_id/focus` | Focus a window |
-| `POST` | `/sessions/:id/windows/:window_id/maximize` | Maximize and focus a window |
-| `POST` | `/sessions/:id/run` | Launch a program (fire-and-forget) |
-| `GET` | `/sessions/:id/clipboard` | Get clipboard contents |
-| `POST` | `/sessions/:id/clipboard` | Set clipboard contents |
-| `GET` | `/sessions/:id/version` | Get client version |
-| `POST` | `/sessions/:id/upload` | Upload a file to the client |
-| `GET` | `/sessions/:id/download?path=...` | Download a file from the client |
+| `GET` | `/devices` | List all connected devices |
+| `POST` | `/devices/:label/screenshot` | Full screen screenshot (base64 PNG) |
+| `POST` | `/devices/:label/exec` | Execute a shell command |
+| `GET` | `/devices/:label/windows` | List visible windows (with labels) |
+| `POST` | `/devices/:label/windows/minimize-all` | Minimize all windows |
+| `POST` | `/devices/:label/windows/:window_id/screenshot` | Screenshot a specific window |
+| `POST` | `/devices/:label/windows/:window_id/focus` | Focus a window |
+| `POST` | `/devices/:label/windows/:window_id/maximize` | Maximize and focus a window |
+| `POST` | `/devices/:label/prompt` | Show a MessageBox (blocks until OK) |
+| `POST` | `/devices/:label/run` | Launch a program (fire-and-forget) |
+| `GET` | `/devices/:label/clipboard` | Get clipboard contents |
+| `POST` | `/devices/:label/clipboard` | Set clipboard contents |
+| `GET` | `/devices/:label/version` | Get client version/commit |
+| `POST` | `/devices/:label/upload` | Upload a file to the client |
+| `GET` | `/devices/:label/download?path=...` | Download a file from the client |
+| `GET` | `/devices/:label/logs` | Fetch client log tail |
+| `GET` | `/version` | Server version/commit (no auth) |

 ### WebSocket

-Clients connect to `ws://host:3000/ws`. No auth required at the transport layer — the server trusts all WS connections as client agents.
+Clients connect to `ws://host:3000/ws`. The first message must be a `Hello` with the device label.

 ### Running the Server

@ -91,8 +104,6 @@ Clients connect to `ws://host:3000/ws`. No auth required at the transport layer
 HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server
 ```

-Environment variables:
-
 | Variable | Default | Description |
 |---|---|---|
 | `HELIOS_API_KEY` | `dev-secret` | API key for REST endpoints |
@ -102,74 +113,67 @@ Environment variables:
 ### Example API Usage

 ```bash
-# List sessions
-curl -H "X-Api-Key: your-secret-key" http://localhost:3000/sessions
+# List devices
+curl -H "X-Api-Key: your-secret-key" http://localhost:3000/devices

-# Take a screenshot
+# Take a full-screen screenshot
 curl -s -X POST -H "X-Api-Key: your-secret-key" \
-  http://localhost:3000/sessions/<session-id>/screenshot
+  http://localhost:3000/devices/moritz_pc/screenshot

 # Run a command
 curl -s -X POST -H "X-Api-Key: your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"command": "whoami"}' \
-  http://localhost:3000/sessions/<session-id>/exec
-
-# Click at coordinates
-curl -s -X POST -H "X-Api-Key: your-secret-key" \
-  -H "Content-Type: application/json" \
-  -d '{"x": 100, "y": 200, "button": "left"}' \
-  http://localhost:3000/sessions/<session-id>/click
+  http://localhost:3000/devices/moritz_pc/exec
 ```

 ## remote.py CLI

-The `skills/helios-remote/remote.py` script provides a simple CLI wrapper around the REST API.
-
-### Label Routing
-
-All commands accept either a UUID or a label name as `session_id`. If the value is not a UUID, the script resolves it by looking up the label across all connected sessions:
-
-```bash
-python remote.py screenshot "Moritz PC"   # resolves label → UUID automatically
-python remote.py exec "Moritz PC" whoami
-```
+The `remote.py` script provides a CLI wrapper around the REST API.

 ### Commands

 ```bash
-python remote.py sessions                          # list sessions
-python remote.py screenshot <session>              # capture screenshot → /tmp/helios-remote-screenshot.png
-python remote.py exec <session> <command...>       # run shell command (PowerShell, no wrapper needed)
-python remote.py exec <session> --timeout 600 <command...>  # with custom timeout (seconds, default: 30)
-python remote.py prompt <session> "Please click Save, then OK"  # ask user to do something manually
-python remote.py click <session> <x> <y>           # mouse click
-python remote.py type <session> <text>             # keyboard input
-python remote.py windows <session>                 # list windows
-python remote.py find-window <session> <title>     # filter windows by title substring
-python remote.py minimize-all <session>            # minimize all windows
-python remote.py focus <session> <window_id>       # focus window
-python remote.py maximize <session> <window_id>    # maximize and focus window
-python remote.py run <session> <program> [args...] # launch program (fire-and-forget)
-python remote.py clipboard-get <session>           # get clipboard text
-python remote.py clipboard-set <session> <text>    # set clipboard text
-python remote.py upload <session> <local> <remote> # upload file
-python remote.py download <session> <remote> <local> # download file
-python remote.py screenshot-window <session> <window_id_or_title>  # screenshot a specific window
-python remote.py screenshot-window <session> <title> --output /tmp/out.png  # custom output path
-python remote.py wait-for-window <session> <title>            # poll until window appears
-python remote.py wait-for-window <session> <title> --timeout 60  # custom timeout (default: 30s)
-python remote.py label <session> <new_name>        # assign a human-readable name to session
-python remote.py status <session>                  # compare relay / remote.py / client commit
-python remote.py logs <session>                    # fetch last 100 lines of client log
-python remote.py logs <session> --lines 200        # custom line count
-python remote.py version <session>                 # client version
-python remote.py server-version                    # server version (no auth required)
+python remote.py devices                                    # list connected devices
+python remote.py screenshot <device> screen                 # full-screen screenshot → /tmp/helios-remote-screenshot.png
+python remote.py screenshot <device> google_chrome          # screenshot a specific window by label
+python remote.py exec <device> <command...>                 # run shell command (PowerShell)
+python remote.py exec <device> --timeout 600 <command...>   # with custom timeout (seconds)
+python remote.py windows <device>                           # list visible windows (with labels)
+python remote.py focus <device> <window_label>              # focus a window
+python remote.py maximize <device> <window_label>           # maximize and focus a window
+python remote.py minimize-all <device>                      # minimize all windows
+python remote.py prompt <device> "Please click Save"        # ask user to do something manually
+python remote.py prompt <device> "message" --title "Title"  # with custom dialog title
+python remote.py run <device> <program> [args...]           # launch program (fire-and-forget)
+python remote.py clipboard-get <device>                     # get clipboard text
+python remote.py clipboard-set <device> <text>              # set clipboard text
+python remote.py upload <device> <local> <remote>           # upload file
+python remote.py download <device> <remote> <local>         # download file
+python remote.py version <device>                           # compare relay/remote.py/client commits
+python remote.py logs <device>                              # fetch last 100 lines of client log
+python remote.py logs <device> --lines 200                  # custom line count
 ```

-## Client (Phase 2)
+### Window Labels

-See [`crates/client/README.md`](crates/client/README.md) for the planned Windows client implementation.
+Windows are identified by human-readable labels (same format as device labels: lowercase, no whitespace). Use `windows` to list them:
+
+```bash
+$ python remote.py windows moritz_pc
+Label                           Title
+----------------------------------------------------------------------
+google_chrome                   Google Chrome
+discord                         Discord
+visual_studio_code              Visual Studio Code
+```
+
+Then use the label in `screenshot`, `focus`, or `maximize`:
+
+```bash
+python remote.py screenshot moritz_pc google_chrome
+python remote.py focus moritz_pc discord
+```

 ## Development