docs: simplify README, remove REST API examples and dev section, polish SKILL.md

This commit is contained in:
Helios 2026-03-06 02:55:51 +01:00
parent 3c7f970d4f
commit ba3b365f4e
No known key found for this signature in database
GPG key ID: C8259547CD8309B5
2 changed files with 71 additions and 168 deletions

177
README.md
View file

@ -4,7 +4,7 @@
<img src="assets/logo.png" width="150" alt="helios-remote logo" /> <img src="assets/logo.png" width="150" alt="helios-remote logo" />
</p> </p>
**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent (or any HTTP client) take full control of a remote Windows machine via a lightweight WebSocket relay. **AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent take full control of a remote Windows machine via a lightweight WebSocket relay.
## Quick Connect ## Quick Connect
@ -26,79 +26,50 @@ irm https://raw.githubusercontent.com/agent-helios/helios-remote/master/scripts/
--- ---
## Architecture ## How It Works
```
helios-remote/
├── crates/
│ ├── common/ # Shared protocol types, WebSocket message definitions
│ ├── server/ # Relay server (REST API + WebSocket hub)
│ └── client/ # Windows client
├── remote.py # CLI wrapper for the REST API
├── Cargo.toml # Workspace root
└── README.md
```
### How It Works
``` ```
AI Agent AI Agent
│ REST API (X-Api-Key)
▼ remote.py CLI
helios-server ──WebSocket── helios-client (Windows) helios-server ──WebSocket── helios-client (Windows)
│ │
POST /devices/:label/screenshot │ Captures screen → base64 PNG
POST /devices/:label/exec │ Runs command in persistent shell
``` ```
1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` with its device label. 1. The **Windows client** connects to the relay server via WebSocket and registers with its device label.
2. The **AI agent** calls the REST API using the device label to issue commands. 2. The **AI agent** uses `remote.py` to issue commands — screenshots, shell commands, window management, file transfers.
3. The relay server forwards commands to the correct client and streams back responses. 3. The relay server forwards everything to the correct client and streams back responses.
### Device Labels Device labels are the sole identifier. Only one client instance can run per device.
Device labels are the **sole identifier** for connected clients. Labels must be: ---
- **Lowercase** only
- **No whitespace**
- Only `a-z`, `0-9`, `-`, `_` as characters
Labels are set during first-time client setup. Examples: `moritz_pc`, `work-desktop`, `gaming-rig` ## remote.py CLI
### Single Instance ```bash
python remote.py devices # list connected devices
python remote.py screenshot <device> screen # full-screen screenshot → /tmp/helios-remote-screenshot.png
python remote.py screenshot <device> <window_label> # screenshot a specific window
python remote.py exec <device> <command...> # run shell command (PowerShell)
python remote.py exec <device> --timeout 600 <command...> # with custom timeout (seconds)
python remote.py windows <device> # list visible windows
python remote.py focus <device> <window_label> # focus a window
python remote.py maximize <device> <window_label> # maximize and focus a window
python remote.py minimize-all <device> # minimize all windows
python remote.py prompt <device> "Please click Save" # show MessageBox, blocks until user confirms
python remote.py prompt <device> "message" --title "Title" # with custom dialog title
python remote.py run <device> <program> [args...] # launch program (fire-and-forget)
python remote.py clipboard-get <device> # get clipboard text
python remote.py clipboard-set <device> <text> # set clipboard text
python remote.py upload <device> <local> <remote> # upload file to device
python remote.py download <device> <remote> <local> # download file from device
python remote.py version <device> # compare relay/remote.py/client commits
python remote.py logs <device> # fetch last 100 lines of client log
python remote.py logs <device> --lines 200 # custom line count
```
Only one helios-remote client can run per device. The client uses a PID-based lock file to enforce this. ---
## Server ## Server Setup
### REST API
All endpoints (except `/version` and `/ws`) require the `X-Api-Key` header.
| Method | Path | Description |
|---|---|---|
| `GET` | `/devices` | List all connected devices |
| `POST` | `/devices/:label/screenshot` | Full screen screenshot (base64 PNG) |
| `POST` | `/devices/:label/exec` | Execute a shell command |
| `GET` | `/devices/:label/windows` | List visible windows (with labels) |
| `POST` | `/devices/:label/windows/minimize-all` | Minimize all windows |
| `POST` | `/devices/:label/windows/:window_id/screenshot` | Screenshot a specific window |
| `POST` | `/devices/:label/windows/:window_id/focus` | Focus a window |
| `POST` | `/devices/:label/windows/:window_id/maximize` | Maximize and focus a window |
| `POST` | `/devices/:label/prompt` | Show a MessageBox (blocks until OK) |
| `POST` | `/devices/:label/run` | Launch a program (fire-and-forget) |
| `GET` | `/devices/:label/clipboard` | Get clipboard contents |
| `POST` | `/devices/:label/clipboard` | Set clipboard contents |
| `GET` | `/devices/:label/version` | Get client version/commit |
| `POST` | `/devices/:label/upload` | Upload a file to the client |
| `GET` | `/devices/:label/download?path=...` | Download a file from the client |
| `GET` | `/devices/:label/logs` | Fetch client log tail |
| `GET` | `/version` | Server version/commit (no auth) |
### WebSocket
Clients connect to `ws://host:3000/ws`. The first message must be a `Hello` with the device label.
### Running the Server
```bash ```bash
HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server
@ -106,87 +77,11 @@ HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-serv
| Variable | Default | Description | | Variable | Default | Description |
|---|---|---| |---|---|---|
| `HELIOS_API_KEY` | `dev-secret` | API key for REST endpoints | | `HELIOS_API_KEY` | `dev-secret` | API key |
| `HELIOS_BIND` | `0.0.0.0:3000` | Listen address | | `HELIOS_BIND` | `0.0.0.0:3000` | Listen address |
| `RUST_LOG` | `helios_server=debug` | Log level | | `RUST_LOG` | `helios_server=debug` | Log level |
### Example API Usage ---
```bash
# List devices
curl -H "X-Api-Key: your-secret-key" http://localhost:3000/devices
# Take a full-screen screenshot
curl -s -X POST -H "X-Api-Key: your-secret-key" \
http://localhost:3000/devices/moritz_pc/screenshot
# Run a command
curl -s -X POST -H "X-Api-Key: your-secret-key" \
-H "Content-Type: application/json" \
-d '{"command": "whoami"}' \
http://localhost:3000/devices/moritz_pc/exec
```
## remote.py CLI
The `remote.py` script provides a CLI wrapper around the REST API.
### Commands
```bash
python remote.py devices # list connected devices
python remote.py screenshot <device> screen # full-screen screenshot → /tmp/helios-remote-screenshot.png
python remote.py screenshot <device> google_chrome # screenshot a specific window by label
python remote.py exec <device> <command...> # run shell command (PowerShell)
python remote.py exec <device> --timeout 600 <command...> # with custom timeout (seconds)
python remote.py windows <device> # list visible windows (with labels)
python remote.py focus <device> <window_label> # focus a window
python remote.py maximize <device> <window_label> # maximize and focus a window
python remote.py minimize-all <device> # minimize all windows
python remote.py prompt <device> "Please click Save" # ask user to do something manually
python remote.py prompt <device> "message" --title "Title" # with custom dialog title
python remote.py run <device> <program> [args...] # launch program (fire-and-forget)
python remote.py clipboard-get <device> # get clipboard text
python remote.py clipboard-set <device> <text> # set clipboard text
python remote.py upload <device> <local> <remote> # upload file
python remote.py download <device> <remote> <local> # download file
python remote.py version <device> # compare relay/remote.py/client commits
python remote.py logs <device> # fetch last 100 lines of client log
python remote.py logs <device> --lines 200 # custom line count
```
### Window Labels
Windows are identified by human-readable labels (same format as device labels: lowercase, no whitespace). Use `windows` to list them:
```bash
$ python remote.py windows moritz_pc
Label Title
----------------------------------------------------------------------
google_chrome Google Chrome
discord Discord
visual_studio_code Visual Studio Code
```
Then use the label in `screenshot`, `focus`, or `maximize`:
```bash
python remote.py screenshot moritz_pc google_chrome
python remote.py focus moritz_pc discord
```
## Development
```bash
# Build everything
cargo build
# Run tests
cargo test
# Run server in dev mode
RUST_LOG=debug cargo run -p helios-server
```
## License ## License

View file

@ -30,63 +30,71 @@ When Moritz asks to do something on a connected PC:
```bash ```bash
SKILL_DIR=/home/moritz/.openclaw/workspace/skills/helios-remote SKILL_DIR=/home/moritz/.openclaw/workspace/skills/helios-remote
# Devices # List connected devices
python $SKILL_DIR/remote.py devices python $SKILL_DIR/remote.py devices
# Screenshot → /tmp/helios-remote-screenshot.png # Screenshot → /tmp/helios-remote-screenshot.png
# ALWAYS prefer window screenshots (saves bandwidth)! # ALWAYS prefer window screenshots (saves bandwidth)!
python $SKILL_DIR/remote.py screenshot moritz_pc google_chrome # window by label python $SKILL_DIR/remote.py screenshot moritz-pc chrome # window by label
python $SKILL_DIR/remote.py screenshot moritz_pc screen # full screen only when no window known python $SKILL_DIR/remote.py screenshot moritz-pc screen # full screen only when no window known
# List visible windows (use labels for screenshot/focus/maximize)
python $SKILL_DIR/remote.py windows moritz-pc
# Window labels come from the process name (e.g. chrome, discord, pycharm64)
# Duplicates get a number suffix: chrome, chrome2, chrome3
# Use `windows` to discover labels before targeting a specific window
# Focus / maximize a window
python $SKILL_DIR/remote.py focus moritz-pc discord
python $SKILL_DIR/remote.py maximize moritz-pc chrome
# Minimize all windows
python $SKILL_DIR/remote.py minimize-all moritz-pc
# Shell command (PowerShell, no wrapper needed) # Shell command (PowerShell, no wrapper needed)
python $SKILL_DIR/remote.py exec moritz_pc "Get-Process" python $SKILL_DIR/remote.py exec moritz-pc "Get-Process"
python $SKILL_DIR/remote.py exec moritz_pc "hostname" python $SKILL_DIR/remote.py exec moritz-pc "hostname"
# With longer timeout for downloads etc. (default: 30s) # With longer timeout for downloads etc. (default: 30s)
python $SKILL_DIR/remote.py exec moritz_pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip" python $SKILL_DIR/remote.py exec moritz-pc --timeout 600 "Invoke-WebRequest -Uri https://... -OutFile C:\file.zip"
# Windows (visible only, shown with human-readable labels)
python $SKILL_DIR/remote.py windows moritz_pc
python $SKILL_DIR/remote.py focus moritz_pc discord
python $SKILL_DIR/remote.py maximize moritz_pc google_chrome
python $SKILL_DIR/remote.py minimize-all moritz_pc
# Launch program (fire-and-forget) # Launch program (fire-and-forget)
python $SKILL_DIR/remote.py run moritz_pc notepad.exe python $SKILL_DIR/remote.py run moritz-pc notepad.exe
# Ask user to do something (shows MessageBox, blocks until OK) # Ask user to do something (shows MessageBox, blocks until OK)
python $SKILL_DIR/remote.py prompt moritz_pc "Please click Save, then OK" python $SKILL_DIR/remote.py prompt moritz-pc "Please click Save, then OK"
python $SKILL_DIR/remote.py prompt moritz_pc "UAC dialog coming - please confirm" --title "Action required" python $SKILL_DIR/remote.py prompt moritz-pc "UAC dialog coming - please confirm" --title "Action required"
# Clipboard # Clipboard
python $SKILL_DIR/remote.py clipboard-get moritz_pc python $SKILL_DIR/remote.py clipboard-get moritz-pc
python $SKILL_DIR/remote.py clipboard-set moritz_pc "Text for clipboard" python $SKILL_DIR/remote.py clipboard-set moritz-pc "Text for clipboard"
# File transfer # File transfer
python $SKILL_DIR/remote.py upload moritz_pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt" python $SKILL_DIR/remote.py upload moritz-pc /tmp/local.txt "C:\Users\Moritz\Desktop\remote.txt"
python $SKILL_DIR/remote.py download moritz_pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt python $SKILL_DIR/remote.py download moritz-pc "C:\Users\Moritz\file.txt" /tmp/downloaded.txt
# Version: compare relay + remote.py + client commits (are they in sync?) # Version: compare relay + remote.py + client commits (are they in sync?)
python $SKILL_DIR/remote.py version moritz_pc python $SKILL_DIR/remote.py version moritz-pc
# Client log (last 100 lines, --lines for more) # Client log (last 100 lines, --lines for more)
python $SKILL_DIR/remote.py logs moritz_pc python $SKILL_DIR/remote.py logs moritz-pc
python $SKILL_DIR/remote.py logs moritz_pc --lines 200 python $SKILL_DIR/remote.py logs moritz-pc --lines 200
``` ```
## Typical Workflow: UI Task ## Typical Workflow: UI Task
1. `screenshot <device> screen` → look at the screen 1. `windows <device>` → find the window label
2. `windows <device>` → find the window label 2. `screenshot <device> <window_label>` → look at it
3. `focus <device> <window_label>` → bring it to front 3. `focus <device> <window_label>` → bring it to front if needed
4. `exec` → perform the action 4. `exec` → perform the action
5. `screenshot <device> <window_label>` → verify result 5. `screenshot <device> <window_label>` → verify result
## ⚠️ Prompt Rule (important!) ## ⚠️ Prompt Rule
**Never interact with UI blindly.** When you need the user to click something: **Never interact with UI blindly.** When you need the user to click something:
```bash ```bash
python $SKILL_DIR/remote.py prompt moritz_pc "Please click [Save], then press OK" python $SKILL_DIR/remote.py prompt moritz-pc "Please click [Save], then press OK"
``` ```
This blocks until the user confirms. Use it whenever manual interaction is needed. This blocks until the user confirms. Use it whenever manual interaction is needed.