helios-remote/README.md

179 lines
6.1 KiB
Markdown

# helios-remote
<p align="center">
<img src="assets/logo.png" width="150" alt="helios-remote logo" />
</p>
**AI-first remote control tool** — a relay server + Windows client written in Rust. Lets an AI agent (or any HTTP client) take full control of a remote Windows machine via a lightweight WebSocket relay.
## Quick Connect
Run this in PowerShell **(as Admin)**:
```powershell
irm https://raw.githubusercontent.com/agent-helios/helios-remote/master/scripts/install.ps1 | iex
```
> **Windows Defender notice:** Defender may flag the executable as unknown.
> Temporarily disable real-time protection before running:
> ```powershell
> Set-MpPreference -DisableRealtimeMonitoring $true
> ```
> Re-enable it after the client has started:
> ```powershell
> Set-MpPreference -DisableRealtimeMonitoring $false
> ```
---
## Architecture
```
helios-remote/
├── crates/
│ ├── common/ # Shared protocol types, WebSocket message definitions
│ ├── server/ # Relay server (REST API + WebSocket hub)
│ └── client/ # Windows client — Phase 2 (stub only)
├── Cargo.toml # Workspace root
└── README.md
```
### How It Works
```
AI Agent
│ REST API (X-Api-Key)
helios-server ──WebSocket── helios-client (Windows)
│ │
POST /sessions/:id/screenshot │ Captures screen → base64 PNG
POST /sessions/:id/exec │ Runs command in persistent shell
POST /sessions/:id/click │ Simulates mouse click
POST /sessions/:id/type │ Types text
```
1. The **Windows client** connects to the relay server via WebSocket and sends a `Hello` message.
2. The **AI agent** calls the REST API to issue commands.
3. The relay server forwards commands to the correct client session and streams back responses.
## Server
### REST API
All endpoints require the `X-Api-Key` header.
| Method | Path | Description |
|---|---|---|
| `GET` | `/sessions` | List all connected clients |
| `POST` | `/sessions/:id/screenshot` | Request a screenshot (returns base64 PNG) |
| `POST` | `/sessions/:id/exec` | Execute a shell command |
| `POST` | `/sessions/:id/click` | Simulate a mouse click |
| `POST` | `/sessions/:id/type` | Type text |
| `POST` | `/sessions/:id/label` | Rename a session |
| `GET` | `/sessions/:id/windows` | List all windows |
| `POST` | `/sessions/:id/windows/minimize-all` | Minimize all windows |
| `POST` | `/sessions/:id/windows/:window_id/focus` | Focus a window |
| `POST` | `/sessions/:id/windows/:window_id/maximize` | Maximize and focus a window |
| `POST` | `/sessions/:id/run` | Launch a program (fire-and-forget) |
| `GET` | `/sessions/:id/clipboard` | Get clipboard contents |
| `POST` | `/sessions/:id/clipboard` | Set clipboard contents |
| `GET` | `/sessions/:id/version` | Get client version |
| `POST` | `/sessions/:id/upload` | Upload a file to the client |
| `GET` | `/sessions/:id/download?path=...` | Download a file from the client |
### WebSocket
Clients connect to `ws://host:3000/ws`. No auth required at the transport layer — the server trusts all WS connections as client agents.
### Running the Server
```bash
HELIOS_API_KEY=your-secret-key HELIOS_BIND=0.0.0.0:3000 cargo run -p helios-server
```
Environment variables:
| Variable | Default | Description |
|---|---|---|
| `HELIOS_API_KEY` | `dev-secret` | API key for REST endpoints |
| `HELIOS_BIND` | `0.0.0.0:3000` | Listen address |
| `RUST_LOG` | `helios_server=debug` | Log level |
### Example API Usage
```bash
# List sessions
curl -H "X-Api-Key: your-secret-key" http://localhost:3000/sessions
# Take a screenshot
curl -s -X POST -H "X-Api-Key: your-secret-key" \
http://localhost:3000/sessions/<session-id>/screenshot
# Run a command
curl -s -X POST -H "X-Api-Key: your-secret-key" \
-H "Content-Type: application/json" \
-d '{"command": "whoami"}' \
http://localhost:3000/sessions/<session-id>/exec
# Click at coordinates
curl -s -X POST -H "X-Api-Key: your-secret-key" \
-H "Content-Type: application/json" \
-d '{"x": 100, "y": 200, "button": "left"}' \
http://localhost:3000/sessions/<session-id>/click
```
## remote.py CLI
The `skills/helios-remote/remote.py` script provides a simple CLI wrapper around the REST API.
### Label Routing
All commands accept either a UUID or a label name as `session_id`. If the value is not a UUID, the script resolves it by looking up the label across all connected sessions:
```bash
python remote.py screenshot "Moritz PC" # resolves label → UUID automatically
python remote.py exec "Moritz PC" whoami
```
### Commands
```bash
python remote.py sessions # list sessions
python remote.py screenshot <session> # capture screenshot → /tmp/helios-remote-screenshot.png
python remote.py exec <session> <command...> # run shell command
python remote.py click <session> <x> <y> # mouse click
python remote.py type <session> <text> # keyboard input
python remote.py windows <session> # list windows
python remote.py find-window <session> <title> # filter windows by title substring
python remote.py minimize-all <session> # minimize all windows
python remote.py focus <session> <window_id> # focus window
python remote.py maximize <session> <window_id> # maximize and focus window
python remote.py run <session> <program> [args...] # launch program (fire-and-forget)
python remote.py clipboard-get <session> # get clipboard text
python remote.py clipboard-set <session> <text> # set clipboard text
python remote.py upload <session> <local> <remote> # upload file
python remote.py download <session> <remote> <local> # download file
python remote.py version <session> # client version
python remote.py server-version # server version
```
## Client (Phase 2)
See [`crates/client/README.md`](crates/client/README.md) for the planned Windows client implementation.
## Development
```bash
# Build everything
cargo build
# Run tests
cargo test
# Run server in dev mode
RUST_LOG=debug cargo run -p helios-server
```
## License
MIT