From ba3b365f4e860b498b3e09561c3b7ac255dfd2f2 Mon Sep 17 00:00:00 2001 From: Helios Date: Fri, 6 Mar 2026 02:55:51 +0100 Subject: [PATCH] docs: simplify README, remove REST API examples and dev section, polish SKILL.md --- README.md | 177 +++++++++++------------------------------------------- SKILL.md | 62 ++++++++++--------- 2 files changed, 71 insertions(+), 168 deletions(-) diff --git a/README.md b/README.md index c4a0cab..fb09253 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ helios-remote logo

-**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent (or any HTTP client) take full control of a remote Windows machine via a lightweight WebSocket relay. +**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent take full control of a remote Windows machine via a lightweight WebSocket relay. ## Quick Connect @@ -26,79 +26,50 @@ irm https://raw.githubusercontent.com/agent-helios/helios-remote/master/scripts/ --- -## Architecture - -``` -helios-remote/ -├── crates/ -│ ├── common/ # Shared protocol types, WebSocket message definitions -│ ├── server/ # Relay server (REST API + WebSocket hub) -│ └── client/ # Windows client -├── remote.py # CLI wrapper for the REST API -├── Cargo.toml # Workspace root -└── README.md -``` - -### How It Works +## How It Works ``` AI Agent - │ REST API (X-Api-Key) - ▼ -helios-server ──WebSocket── helios-client (Windows) - │ │ -POST /devices/:label/screenshot │ Captures screen → base64 PNG -POST /devices/:label/exec │ Runs command in persistent shell + │ + ▼ remote.py CLI +helios-server ──WebSocket── helios-client (Windows) ``` -1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` with its device label. -2. The **AI agent** calls the REST API using the device label to issue commands. -3. The relay server forwards commands to the correct client and streams back responses. +1. The **Windows client** connects to the relay server via WebSocket and registers with its device label. +2. The **AI agent** uses `remote.py` to issue commands — screenshots, shell commands, window management, file transfers. +3. The relay server forwards everything to the correct client and streams back responses. -### Device Labels +Device labels are the sole identifier. Only one client instance can run per device. -Device labels are the **sole identifier** for connected clients. Labels must be: -- **Lowercase** only -- **No whitespace** -- Only `a-z`, `0-9`, `-`, `_` as characters +--- -Labels are set during first-time client setup. Examples: `moritz_pc`, `work-desktop`, `gaming-rig` +## remote.py CLI -### Single Instance +```bash +python remote.py devices # list connected devices +python remote.py screenshot screen # full-screen screenshot → /tmp/helios-remote-screenshot.png +python remote.py screenshot # screenshot a specific window +python remote.py exec # run shell command (PowerShell) +python remote.py exec --timeout 600 # with custom timeout (seconds) +python remote.py windows # list visible windows +python remote.py focus # focus a window +python remote.py maximize # maximize and focus a window +python remote.py minimize-all # minimize all windows +python remote.py prompt "Please click Save" # show MessageBox, blocks until user confirms +python remote.py prompt "message" --title "Title" # with custom dialog title +python remote.py run [args...] # launch program (fire-and-forget) +python remote.py clipboard-get # get clipboard text +python remote.py clipboard-set # set clipboard text +python remote.py upload # upload file to device +python remote.py download # download file from device +python remote.py version # compare relay/remote.py/client commits +python remote.py logs # fetch last 100 lines of client log +python remote.py logs --lines 200 # custom line count +``` -Only one helios-remote client can run per device. The client uses a PID-based lock file to enforce this. +--- -## Server - -### REST API - -All endpoints (except `/version` and `/ws`) require the `X-Api-Key` header. - -| Method | Path | Description | -|---|---|---| -| `GET` | `/devices` | List all connected devices | -| `POST` | `/devices/:label/screenshot` | Full screen screenshot (base64 PNG) | -| `POST` | `/devices/:label/exec` | Execute a shell command | -| `GET` | `/devices/:label/windows` | List visible windows (with labels) | -| `POST` | `/devices/:label/windows/minimize-all` | Minimize all windows | -| `POST` | `/devices/:label/windows/:window_id/screenshot` | Screenshot a specific window | -| `POST` | `/devices/:label/windows/:window_id/focus` | Focus a window | -| `POST` | `/devices/:label/windows/:window_id/maximize` | Maximize and focus a window | -| `POST` | `/devices/:label/prompt` | Show a MessageBox (blocks until OK) | -| `POST` | `/devices/:label/run` | Launch a program (fire-and-forget) | -| `GET` | `/devices/:label/clipboard` | Get clipboard contents | -| `POST` | `/devices/:label/clipboard` | Set clipboard contents | -| `GET` | `/devices/:label/version` | Get client version/commit | -| `POST` | `/devices/:label/upload` | Upload a file to the client | -| `GET` | `/devices/:label/download?path=...` | Download a file from the client | -| `GET` | `/devices/:label/logs` | Fetch client log tail | -| `GET` | `/version` | Server version/commit (no auth) | - -### WebSocket - -Clients connect to `ws://host:3000/ws`. The first message must be a `Hello` with the device label. - -### Running the Server +## Server Setup ```bash HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server @@ -106,87 +77,11 @@ HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-serv | Variable | Default | Description | |---|---|---| -| `HELIOS_API_KEY` | `dev-secret` | API key for REST endpoints | +| `HELIOS_API_KEY` | `dev-secret` | API key | | `HELIOS_BIND` | `0.0.0.0:3000` | Listen address | | `RUST_LOG` | `helios_server=debug` | Log level | -### Example API Usage - -```bash -# List devices -curl -H "X-Api-Key: your-secret-key" http://localhost:3000/devices - -# Take a full-screen screenshot -curl -s -X POST -H "X-Api-Key: your-secret-key" \ - http://localhost:3000/devices/moritz_pc/screenshot - -# Run a command -curl -s -X POST -H "X-Api-Key: your-secret-key" \ - -H "Content-Type: application/json" \ - -d '{"command": "whoami"}' \ - http://localhost:3000/devices/moritz_pc/exec -``` - -## remote.py CLI - -The `remote.py` script provides a CLI wrapper around the REST API. - -### Commands - -```bash -python remote.py devices # list connected devices -python remote.py screenshot screen # full-screen screenshot → /tmp/helios-remote-screenshot.png -python remote.py screenshot google_chrome # screenshot a specific window by label -python remote.py exec # run shell command (PowerShell) -python remote.py exec --timeout 600 # with custom timeout (seconds) -python remote.py windows # list visible windows (with labels) -python remote.py focus # focus a window -python remote.py maximize # maximize and focus a window -python remote.py minimize-all # minimize all windows -python remote.py prompt "Please click Save" # ask user to do something manually -python remote.py prompt "message" --title "Title" # with custom dialog title -python remote.py run [args...] # launch program (fire-and-forget) -python remote.py clipboard-get # get clipboard text -python remote.py clipboard-set # set clipboard text -python remote.py upload # upload file -python remote.py download # download file -python remote.py version # compare relay/remote.py/client commits -python remote.py logs # fetch last 100 lines of client log -python remote.py logs --lines 200 # custom line count -``` - -### Window Labels - -Windows are identified by human-readable labels (same format as device labels: lowercase, no whitespace). Use `windows` to list them: - -```bash -$ python remote.py windows moritz_pc -Label Title ----------------------------------------------------------------------- -google_chrome Google Chrome -discord Discord -visual_studio_code Visual Studio Code -``` - -Then use the label in `screenshot`, `focus`, or `maximize`: - -```bash -python remote.py screenshot moritz_pc google_chrome -python remote.py focus moritz_pc discord -``` - -## Development - -```bash -# Build everything -cargo build - -# Run tests -cargo test - -# Run server in dev mode -RUST_LOG=debug cargo run -p helios-server -``` +--- ## License diff --git a/SKILL.md b/SKILL.md index d6cc969..ce0a5e7 100644 --- a/SKILL.md +++ b/SKILL.md @@ -30,63 +30,71 @@ When Moritz asks to do something on a connected PC: ```bash SKILL_DIR=/home/moritz/.openclaw/workspace/skills/helios-remote -# Devices +# List connected devices python $SKILL_DIR/remote.py devices # Screenshot → /tmp/helios-remote-screenshot.png # ALWAYS prefer window screenshots (saves bandwidth)! -python $SKILL_DIR/remote.py screenshot moritz_pc google_chrome # window by label -python $SKILL_DIR/remote.py screenshot moritz_pc screen # full screen only when no window known +python $SKILL_DIR/remote.py screenshot moritz-pc chrome # window by label +python $SKILL_DIR/remote.py screenshot moritz-pc screen # full screen only when no window known + +# List visible windows (use labels for screenshot/focus/maximize) +python $SKILL_DIR/remote.py windows moritz-pc + +# Window labels come from the process name (e.g. chrome, discord, pycharm64) +# Duplicates get a number suffix: chrome, chrome2, chrome3 +# Use `windows` to discover labels before targeting a specific window + +# Focus / maximize a window +python $SKILL_DIR/remote.py focus moritz-pc discord +python $SKILL_DIR/remote.py maximize moritz-pc chrome + +# Minimize all windows +python $SKILL_DIR/remote.py minimize-all moritz-pc # Shell command (PowerShell, no wrapper needed) -python $SKILL_DIR/remote.py exec moritz_pc "Get-Process" -python $SKILL_DIR/remote.py exec moritz_pc "hostname" +python $SKILL_DIR/remote.py exec moritz-pc "Get-Process" +python $SKILL_DIR/remote.py exec moritz-pc "hostname" # With longer timeout for downloads etc. (default: 30s) -python $SKILL_DIR/remote.py exec moritz_pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip" - -# Windows (visible only, shown with human-readable labels) -python $SKILL_DIR/remote.py windows moritz_pc -python $SKILL_DIR/remote.py focus moritz_pc discord -python $SKILL_DIR/remote.py maximize moritz_pc google_chrome -python $SKILL_DIR/remote.py minimize-all moritz_pc +python $SKILL_DIR/remote.py exec moritz-pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip" # Launch program (fire-and-forget) -python $SKILL_DIR/remote.py run moritz_pc notepad.exe +python $SKILL_DIR/remote.py run moritz-pc notepad.exe # Ask user to do something (shows MessageBox, blocks until OK) -python $SKILL_DIR/remote.py prompt moritz_pc "Please click Save, then OK" -python $SKILL_DIR/remote.py prompt moritz_pc "UAC dialog coming - please confirm" --title "Action required" +python $SKILL_DIR/remote.py prompt moritz-pc "Please click Save, then OK" +python $SKILL_DIR/remote.py prompt moritz-pc "UAC dialog coming - please confirm" --title "Action required" # Clipboard -python $SKILL_DIR/remote.py clipboard-get moritz_pc -python $SKILL_DIR/remote.py clipboard-set moritz_pc "Text for clipboard" +python $SKILL_DIR/remote.py clipboard-get moritz-pc +python $SKILL_DIR/remote.py clipboard-set moritz-pc "Text for clipboard" # File transfer -python $SKILL_DIR/remote.py upload moritz_pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt" -python $SKILL_DIR/remote.py download moritz_pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt +python $SKILL_DIR/remote.py upload moritz-pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt" +python $SKILL_DIR/remote.py download moritz-pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt # Version: compare relay + remote.py + client commits (are they in sync?) -python $SKILL_DIR/remote.py version moritz_pc +python $SKILL_DIR/remote.py version moritz-pc # Client log (last 100 lines, --lines for more) -python $SKILL_DIR/remote.py logs moritz_pc -python $SKILL_DIR/remote.py logs moritz_pc --lines 200 +python $SKILL_DIR/remote.py logs moritz-pc +python $SKILL_DIR/remote.py logs moritz-pc --lines 200 ``` ## Typical Workflow: UI Task -1. `screenshot screen` → look at the screen -2. `windows ` → find the window label -3. `focus ` → bring it to front +1. `windows ` → find the window label +2. `screenshot ` → look at it +3. `focus ` → bring it to front if needed 4. `exec` → perform the action 5. `screenshot ` → verify result -## ⚠️ Prompt Rule (important!) +## ⚠️ Prompt Rule **Never interact with UI blindly.** When you need the user to click something: ```bash -python $SKILL_DIR/remote.py prompt moritz_pc "Please click [Save], then press OK" +python $SKILL_DIR/remote.py prompt moritz-pc "Please click [Save], then press OK" ``` This blocks until the user confirms. Use it whenever manual interaction is needed.