docs: simplify README, remove REST API examples and dev section, polish SKILL.md

2026-03-06 02:55:51 +01:00 · 2026-03-06 02:55:51 +01:00 · ba3b365f4e
commit ba3b365f4e
parent 3c7f970d4f
2 changed files with 71 additions and 168 deletions
--- a/README.md
+++ b/README.md
@ -4,7 +4,7 @@
  <img src="assets/logo.png" width="150" alt="helios-remote logo" />
 </p>

-**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent (or any HTTP client) take full control of a remote Windows machine via a lightweight WebSocket relay.
+**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent take full control of a remote Windows machine via a lightweight WebSocket relay.

 ## Quick Connect

@ -26,79 +26,50 @@ irm https://raw.githubusercontent.com/agent-helios/helios-remote/master/scripts/

 ---

-## Architecture
-
-```
-helios-remote/
-├── crates/
-│   ├── common/     # Shared protocol types, WebSocket message definitions
-│   ├── server/     # Relay server (REST API + WebSocket hub)
-│   └── client/     # Windows client
-├── remote.py       # CLI wrapper for the REST API
-├── Cargo.toml      # Workspace root
-└── README.md
-```
-
-### How It Works
+## How It Works

 ```
 AI Agent
-   │  REST API (X-Api-Key)
-   ▼
-helios-server  ──WebSocket──  helios-client (Windows)
-   │                               │
-POST /devices/:label/screenshot    │  Captures screen → base64 PNG
-POST /devices/:label/exec         │  Runs command in persistent shell
+   │
+   ▼  remote.py CLI
+helios-server ──WebSocket── helios-client (Windows)
 ```

-1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` with its device label.
-2. The **AI agent** calls the REST API using the device label to issue commands.
-3. The relay server forwards commands to the correct client and streams back responses.
+1. The **Windows client** connects to the relay server via WebSocket and registers with its device label.
+2. The **AI agent** uses `remote.py` to issue commands — screenshots, shell commands, window management, file transfers.
+3. The relay server forwards everything to the correct client and streams back responses.

-### Device Labels
+Device labels are the sole identifier. Only one client instance can run per device.

-Device labels are the **sole identifier** for connected clients. Labels must be:
- **Lowercase** only
- **No whitespace**
- Only `a-z`, `0-9`, `-`, `_` as characters
+---

-Labels are set during first-time client setup. Examples: `moritz_pc`, `work-desktop`, `gaming-rig`
+## remote.py CLI

-### Single Instance
+```bash
+python remote.py devices                                    # list connected devices
+python remote.py screenshot <device> screen                 # full-screen screenshot → /tmp/helios-remote-screenshot.png
+python remote.py screenshot <device> <window_label>         # screenshot a specific window
+python remote.py exec <device> <command...>                 # run shell command (PowerShell)
+python remote.py exec <device> --timeout 600 <command...>   # with custom timeout (seconds)
+python remote.py windows <device>                           # list visible windows
+python remote.py focus <device> <window_label>              # focus a window
+python remote.py maximize <device> <window_label>           # maximize and focus a window
+python remote.py minimize-all <device>                      # minimize all windows
+python remote.py prompt <device> "Please click Save"        # show MessageBox, blocks until user confirms
+python remote.py prompt <device> "message" --title "Title"  # with custom dialog title
+python remote.py run <device> <program> [args...]           # launch program (fire-and-forget)
+python remote.py clipboard-get <device>                     # get clipboard text
+python remote.py clipboard-set <device> <text>              # set clipboard text
+python remote.py upload <device> <local> <remote>           # upload file to device
+python remote.py download <device> <remote> <local>         # download file from device
+python remote.py version <device>                           # compare relay/remote.py/client commits
+python remote.py logs <device>                              # fetch last 100 lines of client log
+python remote.py logs <device> --lines 200                  # custom line count
+```

-Only one helios-remote client can run per device. The client uses a PID-based lock file to enforce this.
+---

-## Server
-
-### REST API
-
-All endpoints (except `/version` and `/ws`) require the `X-Api-Key` header.
-
-| Method | Path | Description |
-|---|---|---|
-| `GET` | `/devices` | List all connected devices |
-| `POST` | `/devices/:label/screenshot` | Full screen screenshot (base64 PNG) |
-| `POST` | `/devices/:label/exec` | Execute a shell command |
-| `GET` | `/devices/:label/windows` | List visible windows (with labels) |
-| `POST` | `/devices/:label/windows/minimize-all` | Minimize all windows |
-| `POST` | `/devices/:label/windows/:window_id/screenshot` | Screenshot a specific window |
-| `POST` | `/devices/:label/windows/:window_id/focus` | Focus a window |
-| `POST` | `/devices/:label/windows/:window_id/maximize` | Maximize and focus a window |
-| `POST` | `/devices/:label/prompt` | Show a MessageBox (blocks until OK) |
-| `POST` | `/devices/:label/run` | Launch a program (fire-and-forget) |
-| `GET` | `/devices/:label/clipboard` | Get clipboard contents |
-| `POST` | `/devices/:label/clipboard` | Set clipboard contents |
-| `GET` | `/devices/:label/version` | Get client version/commit |
-| `POST` | `/devices/:label/upload` | Upload a file to the client |
-| `GET` | `/devices/:label/download?path=...` | Download a file from the client |
-| `GET` | `/devices/:label/logs` | Fetch client log tail |
-| `GET` | `/version` | Server version/commit (no auth) |
-
-### WebSocket
-
-Clients connect to `ws://host:3000/ws`. The first message must be a `Hello` with the device label.
-
-### Running the Server
+## Server Setup

 ```bash
 HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server
@ -106,87 +77,11 @@ HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-serv

 | Variable | Default | Description |
 |---|---|---|
-| `HELIOS_API_KEY` | `dev-secret` | API key for REST endpoints |
+| `HELIOS_API_KEY` | `dev-secret` | API key |
 | `HELIOS_BIND` | `0.0.0.0:3000` | Listen address |
 | `RUST_LOG` | `helios_server=debug` | Log level |

-### Example API Usage
-
-```bash
-# List devices
-curl -H "X-Api-Key: your-secret-key" http://localhost:3000/devices
-
-# Take a full-screen screenshot
-curl -s -X POST -H "X-Api-Key: your-secret-key" \
-  http://localhost:3000/devices/moritz_pc/screenshot
-
-# Run a command
-curl -s -X POST -H "X-Api-Key: your-secret-key" \
-  -H "Content-Type: application/json" \
-  -d '{"command": "whoami"}' \
-  http://localhost:3000/devices/moritz_pc/exec
-```
-
-## remote.py CLI
-
-The `remote.py` script provides a CLI wrapper around the REST API.
-
-### Commands
-
-```bash
-python remote.py devices                                    # list connected devices
-python remote.py screenshot <device> screen                 # full-screen screenshot → /tmp/helios-remote-screenshot.png
-python remote.py screenshot <device> google_chrome          # screenshot a specific window by label
-python remote.py exec <device> <command...>                 # run shell command (PowerShell)
-python remote.py exec <device> --timeout 600 <command...>   # with custom timeout (seconds)
-python remote.py windows <device>                           # list visible windows (with labels)
-python remote.py focus <device> <window_label>              # focus a window
-python remote.py maximize <device> <window_label>           # maximize and focus a window
-python remote.py minimize-all <device>                      # minimize all windows
-python remote.py prompt <device> "Please click Save"        # ask user to do something manually
-python remote.py prompt <device> "message" --title "Title"  # with custom dialog title
-python remote.py run <device> <program> [args...]           # launch program (fire-and-forget)
-python remote.py clipboard-get <device>                     # get clipboard text
-python remote.py clipboard-set <device> <text>              # set clipboard text
-python remote.py upload <device> <local> <remote>           # upload file
-python remote.py download <device> <remote> <local>         # download file
-python remote.py version <device>                           # compare relay/remote.py/client commits
-python remote.py logs <device>                              # fetch last 100 lines of client log
-python remote.py logs <device> --lines 200                  # custom line count
-```
-
-### Window Labels
-
-Windows are identified by human-readable labels (same format as device labels: lowercase, no whitespace). Use `windows` to list them:
-
-```bash
-$ python remote.py windows moritz_pc
-Label                           Title
----------------------------------------------------------------------
-google_chrome                   Google Chrome
-discord                         Discord
-visual_studio_code              Visual Studio Code
-```
-
-Then use the label in `screenshot`, `focus`, or `maximize`:
-
-```bash
-python remote.py screenshot moritz_pc google_chrome
-python remote.py focus moritz_pc discord
-```
-
-## Development
-
-```bash
-# Build everything
-cargo build
-
-# Run tests
-cargo test
-
-# Run server in dev mode
-RUST_LOG=debug cargo run -p helios-server
-```
+---

 ## License

--- a/SKILL.md
+++ b/SKILL.md
@ -30,63 +30,71 @@ When Moritz asks to do something on a connected PC:
 ```bash
 SKILL_DIR=/home/moritz/.openclaw/workspace/skills/helios-remote

-# Devices
+# List connected devices
 python $SKILL_DIR/remote.py devices

 # Screenshot → /tmp/helios-remote-screenshot.png
 # ALWAYS prefer window screenshots (saves bandwidth)!
-python $SKILL_DIR/remote.py screenshot moritz_pc google_chrome   # window by label
-python $SKILL_DIR/remote.py screenshot moritz_pc screen          # full screen only when no window known
+python $SKILL_DIR/remote.py screenshot moritz-pc chrome          # window by label
+python $SKILL_DIR/remote.py screenshot moritz-pc screen          # full screen only when no window known
+
+# List visible windows (use labels for screenshot/focus/maximize)
+python $SKILL_DIR/remote.py windows moritz-pc
+
+# Window labels come from the process name (e.g. chrome, discord, pycharm64)
+# Duplicates get a number suffix: chrome, chrome2, chrome3
+# Use `windows` to discover labels before targeting a specific window
+
+# Focus / maximize a window
+python $SKILL_DIR/remote.py focus moritz-pc discord
+python $SKILL_DIR/remote.py maximize moritz-pc chrome
+
+# Minimize all windows
+python $SKILL_DIR/remote.py minimize-all moritz-pc

 # Shell command (PowerShell, no wrapper needed)
-python $SKILL_DIR/remote.py exec moritz_pc "Get-Process"
-python $SKILL_DIR/remote.py exec moritz_pc "hostname"
+python $SKILL_DIR/remote.py exec moritz-pc "Get-Process"
+python $SKILL_DIR/remote.py exec moritz-pc "hostname"
 # With longer timeout for downloads etc. (default: 30s)
-python $SKILL_DIR/remote.py exec moritz_pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip"
-
-# Windows (visible only, shown with human-readable labels)
-python $SKILL_DIR/remote.py windows moritz_pc
-python $SKILL_DIR/remote.py focus moritz_pc discord
-python $SKILL_DIR/remote.py maximize moritz_pc google_chrome
-python $SKILL_DIR/remote.py minimize-all moritz_pc
+python $SKILL_DIR/remote.py exec moritz-pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip"

 # Launch program (fire-and-forget)
-python $SKILL_DIR/remote.py run moritz_pc notepad.exe
+python $SKILL_DIR/remote.py run moritz-pc notepad.exe

 # Ask user to do something (shows MessageBox, blocks until OK)
-python $SKILL_DIR/remote.py prompt moritz_pc "Please click Save, then OK"
-python $SKILL_DIR/remote.py prompt moritz_pc "UAC dialog coming - please confirm" --title "Action required"
+python $SKILL_DIR/remote.py prompt moritz-pc "Please click Save, then OK"
+python $SKILL_DIR/remote.py prompt moritz-pc "UAC dialog coming - please confirm" --title "Action required"

 # Clipboard
-python $SKILL_DIR/remote.py clipboard-get moritz_pc
-python $SKILL_DIR/remote.py clipboard-set moritz_pc "Text for clipboard"
+python $SKILL_DIR/remote.py clipboard-get moritz-pc
+python $SKILL_DIR/remote.py clipboard-set moritz-pc "Text for clipboard"

 # File transfer
-python $SKILL_DIR/remote.py upload moritz_pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt"
-python $SKILL_DIR/remote.py download moritz_pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt
+python $SKILL_DIR/remote.py upload moritz-pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt"
+python $SKILL_DIR/remote.py download moritz-pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt

 # Version: compare relay + remote.py + client commits (are they in sync?)
-python $SKILL_DIR/remote.py version moritz_pc
+python $SKILL_DIR/remote.py version moritz-pc

 # Client log (last 100 lines, --lines for more)
-python $SKILL_DIR/remote.py logs moritz_pc
-python $SKILL_DIR/remote.py logs moritz_pc --lines 200
+python $SKILL_DIR/remote.py logs moritz-pc
+python $SKILL_DIR/remote.py logs moritz-pc --lines 200
 ```

 ## Typical Workflow: UI Task

-1. `screenshot <device> screen` → look at the screen
-2. `windows <device>` → find the window label
-3. `focus <device> <window_label>` → bring it to front
+1. `windows <device>` → find the window label
+2. `screenshot <device> <window_label>` → look at it
+3. `focus <device> <window_label>` → bring it to front if needed
 4. `exec` → perform the action
 5. `screenshot <device> <window_label>` → verify result

-## ⚠️ Prompt Rule (important!)
+## ⚠️ Prompt Rule

 **Never interact with UI blindly.** When you need the user to click something:

 ```bash
-python $SKILL_DIR/remote.py prompt moritz_pc "Please click [Save], then press OK"
+python $SKILL_DIR/remote.py prompt moritz-pc "Please click [Save], then press OK"
 ```

 This blocks until the user confirms. Use it whenever manual interaction is needed.