... # HTTP headers scoped to URL's origin
+agent-browser --executable-path # Custom browser executable
+agent-browser --extension ... # Load browser extension (repeatable)
+agent-browser --ignore-https-errors # Ignore SSL certificate errors
+agent-browser --help # Show help (-h)
+agent-browser --version # Show version (-V)
+agent-browser --help # Show detailed help for a command
+```
+
+## Debugging
+
+```bash
+agent-browser --headed open example.com # Show browser window
+agent-browser --cdp 9222 snapshot # Connect via CDP port
+agent-browser connect 9222 # Alternative: connect command
+agent-browser console # View console messages
+agent-browser console --clear # Clear console
+agent-browser errors # View page errors
+agent-browser errors --clear # Clear errors
+agent-browser highlight @e1 # Highlight element
+agent-browser inspect # Open Chrome DevTools for this session
+agent-browser trace start # Start recording trace
+agent-browser trace stop trace.zip # Stop and save trace
+agent-browser profiler start # Start Chrome DevTools profiling
+agent-browser profiler stop trace.json # Stop and save profile
+```
+
+## Environment Variables
+
+```bash
+AGENT_BROWSER_SESSION="mysession" # Default session name
+AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
+AGENT_BROWSER_EXTENSIONS="/ext1,/ext2" # Comma-separated extension paths
+AGENT_BROWSER_PROVIDER="browserbase" # Cloud browser provider
+AGENT_BROWSER_STREAM_PORT="9223" # WebSocket streaming port
+AGENT_BROWSER_HOME="/path/to/agent-browser" # Custom install location
+```
diff --git a/workspace/skills/agent-browser/references/profiling.md b/workspace/skills/agent-browser/references/profiling.md
new file mode 100644
index 00000000..bd47eaa0
--- /dev/null
+++ b/workspace/skills/agent-browser/references/profiling.md
@@ -0,0 +1,120 @@
+# Profiling
+
+Capture Chrome DevTools performance profiles during browser automation for performance analysis.
+
+**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
+
+## Contents
+
+- [Basic Profiling](#basic-profiling)
+- [Profiler Commands](#profiler-commands)
+- [Categories](#categories)
+- [Use Cases](#use-cases)
+- [Output Format](#output-format)
+- [Viewing Profiles](#viewing-profiles)
+- [Limitations](#limitations)
+
+## Basic Profiling
+
+```bash
+# Start profiling
+agent-browser profiler start
+
+# Perform actions
+agent-browser navigate https://example.com
+agent-browser click "#button"
+agent-browser wait 1000
+
+# Stop and save
+agent-browser profiler stop ./trace.json
+```
+
+## Profiler Commands
+
+```bash
+# Start profiling with default categories
+agent-browser profiler start
+
+# Start with custom trace categories
+agent-browser profiler start --categories "devtools.timeline,v8.execute,blink.user_timing"
+
+# Stop profiling and save to file
+agent-browser profiler stop ./trace.json
+```
+
+## Categories
+
+The `--categories` flag accepts a comma-separated list of Chrome trace categories. Default categories include:
+
+- `devtools.timeline` -- standard DevTools performance traces
+- `v8.execute` -- time spent running JavaScript
+- `blink` -- renderer events
+- `blink.user_timing` -- `performance.mark()` / `performance.measure()` calls
+- `latencyInfo` -- input-to-latency tracking
+- `renderer.scheduler` -- task scheduling and execution
+- `toplevel` -- broad-spectrum basic events
+
+Several `disabled-by-default-*` categories are also included for detailed timeline, call stack, and V8 CPU profiling data.
+
+## Use Cases
+
+### Diagnosing Slow Page Loads
+
+```bash
+agent-browser profiler start
+agent-browser navigate https://app.example.com
+agent-browser wait --load networkidle
+agent-browser profiler stop ./page-load-profile.json
+```
+
+### Profiling User Interactions
+
+```bash
+agent-browser navigate https://app.example.com
+agent-browser profiler start
+agent-browser click "#submit"
+agent-browser wait 2000
+agent-browser profiler stop ./interaction-profile.json
+```
+
+### CI Performance Regression Checks
+
+```bash
+#!/bin/bash
+agent-browser profiler start
+agent-browser navigate https://app.example.com
+agent-browser wait --load networkidle
+agent-browser profiler stop "./profiles/build-${BUILD_ID}.json"
+```
+
+## Output Format
+
+The output is a JSON file in Chrome Trace Event format:
+
+```json
+{
+ "traceEvents": [
+ { "cat": "devtools.timeline", "name": "RunTask", "ph": "X", "ts": 12345, "dur": 100, ... },
+ ...
+ ],
+ "metadata": {
+ "clock-domain": "LINUX_CLOCK_MONOTONIC"
+ }
+}
+```
+
+The `metadata.clock-domain` field is set based on the host platform (Linux or macOS). On Windows it is omitted.
+
+## Viewing Profiles
+
+Load the output JSON file in any of these tools:
+
+- **Chrome DevTools**: Performance panel > Load profile (Ctrl+Shift+I > Performance)
+- **Perfetto UI**: https://ui.perfetto.dev/ -- drag and drop the JSON file
+- **Trace Viewer**: `chrome://tracing` in any Chromium browser
+
+## Limitations
+
+- Only works with Chromium-based browsers (Chrome, Edge). Not supported on Firefox or WebKit.
+- Trace data accumulates in memory while profiling is active (capped at 5 million events). Stop profiling promptly after the area of interest.
+- Data collection on stop has a 30-second timeout. If the browser is unresponsive, the stop command may fail.
diff --git a/workspace/skills/agent-browser/references/proxy-support.md b/workspace/skills/agent-browser/references/proxy-support.md
new file mode 100644
index 00000000..e86a8fe3
--- /dev/null
+++ b/workspace/skills/agent-browser/references/proxy-support.md
@@ -0,0 +1,194 @@
+# Proxy Support
+
+Proxy configuration for geo-testing, rate limiting avoidance, and corporate environments.
+
+**Related**: [commands.md](commands.md) for global options, [SKILL.md](../SKILL.md) for quick start.
+
+## Contents
+
+- [Basic Proxy Configuration](#basic-proxy-configuration)
+- [Authenticated Proxy](#authenticated-proxy)
+- [SOCKS Proxy](#socks-proxy)
+- [Proxy Bypass](#proxy-bypass)
+- [Common Use Cases](#common-use-cases)
+- [Verifying Proxy Connection](#verifying-proxy-connection)
+- [Troubleshooting](#troubleshooting)
+- [Best Practices](#best-practices)
+
+## Basic Proxy Configuration
+
+Use the `--proxy` flag or set proxy via environment variable:
+
+```bash
+# Via CLI flag
+agent-browser --proxy "http://proxy.example.com:8080" open https://example.com
+
+# Via environment variable
+export HTTP_PROXY="http://proxy.example.com:8080"
+agent-browser open https://example.com
+
+# HTTPS proxy
+export HTTPS_PROXY="https://proxy.example.com:8080"
+agent-browser open https://example.com
+
+# Both
+export HTTP_PROXY="http://proxy.example.com:8080"
+export HTTPS_PROXY="http://proxy.example.com:8080"
+agent-browser open https://example.com
+```
+
+## Authenticated Proxy
+
+For proxies requiring authentication:
+
+```bash
+# Include credentials in URL
+export HTTP_PROXY="http://username:password@proxy.example.com:8080"
+agent-browser open https://example.com
+```
+
+## SOCKS Proxy
+
+```bash
+# SOCKS5 proxy
+export ALL_PROXY="socks5://proxy.example.com:1080"
+agent-browser open https://example.com
+
+# SOCKS5 with auth
+export ALL_PROXY="socks5://user:pass@proxy.example.com:1080"
+agent-browser open https://example.com
+```
+
+## Proxy Bypass
+
+Skip proxy for specific domains using `--proxy-bypass` or `NO_PROXY`:
+
+```bash
+# Via CLI flag
+agent-browser --proxy "http://proxy.example.com:8080" --proxy-bypass "localhost,*.internal.com" open https://example.com
+
+# Via environment variable
+export NO_PROXY="localhost,127.0.0.1,.internal.company.com"
+agent-browser open https://internal.company.com # Direct connection
+agent-browser open https://external.com # Via proxy
+```
+
+## Common Use Cases
+
+### Geo-Location Testing
+
+```bash
+#!/bin/bash
+# Test site from different regions using geo-located proxies
+
+PROXIES=(
+ "http://us-proxy.example.com:8080"
+ "http://eu-proxy.example.com:8080"
+ "http://asia-proxy.example.com:8080"
+)
+
+for proxy in "${PROXIES[@]}"; do
+ export HTTP_PROXY="$proxy"
+ export HTTPS_PROXY="$proxy"
+
+ region=$(echo "$proxy" | grep -oP '^\w+-\w+')
+ echo "Testing from: $region"
+
+ agent-browser --session "$region" open https://example.com
+ agent-browser --session "$region" screenshot "./screenshots/$region.png"
+ agent-browser --session "$region" close
+done
+```
+
+### Rotating Proxies for Scraping
+
+```bash
+#!/bin/bash
+# Rotate through proxy list to avoid rate limiting
+
+PROXY_LIST=(
+ "http://proxy1.example.com:8080"
+ "http://proxy2.example.com:8080"
+ "http://proxy3.example.com:8080"
+)
+
+URLS=(
+ "https://site.com/page1"
+ "https://site.com/page2"
+ "https://site.com/page3"
+)
+
+for i in "${!URLS[@]}"; do
+ proxy_index=$((i % ${#PROXY_LIST[@]}))
+ export HTTP_PROXY="${PROXY_LIST[$proxy_index]}"
+ export HTTPS_PROXY="${PROXY_LIST[$proxy_index]}"
+
+ agent-browser open "${URLS[$i]}"
+ agent-browser get text body > "output-$i.txt"
+ agent-browser close
+
+ sleep 1 # Polite delay
+done
+```
+
+### Corporate Network Access
+
+```bash
+#!/bin/bash
+# Access internal sites via corporate proxy
+
+export HTTP_PROXY="http://corpproxy.company.com:8080"
+export HTTPS_PROXY="http://corpproxy.company.com:8080"
+export NO_PROXY="localhost,127.0.0.1,.company.com"
+
+# External sites go through proxy
+agent-browser open https://external-vendor.com
+
+# Internal sites bypass proxy
+agent-browser open https://intranet.company.com
+```
+
+## Verifying Proxy Connection
+
+```bash
+# Check your apparent IP
+agent-browser open https://httpbin.org/ip
+agent-browser get text body
+# Should show proxy's IP, not your real IP
+```
+
+## Troubleshooting
+
+### Proxy Connection Failed
+
+```bash
+# Test proxy connectivity first
+curl -x http://proxy.example.com:8080 https://httpbin.org/ip
+
+# Check if proxy requires auth
+export HTTP_PROXY="http://user:pass@proxy.example.com:8080"
+```
+
+### SSL/TLS Errors Through Proxy
+
+Some proxies perform SSL inspection. If you encounter certificate errors:
+
+```bash
+# For testing only - not recommended for production
+agent-browser open https://example.com --ignore-https-errors
+```
+
+### Slow Performance
+
+```bash
+# Use proxy only when necessary
+export NO_PROXY="*.cdn.com,*.static.com" # Direct CDN access
+```
+
+## Best Practices
+
+1. **Use environment variables** - Don't hardcode proxy credentials
+2. **Set NO_PROXY appropriately** - Avoid routing local traffic through proxy
+3. **Test proxy before automation** - Verify connectivity with simple requests
+4. **Handle proxy failures gracefully** - Implement retry logic for unstable proxies
+5. **Rotate proxies for large scraping jobs** - Distribute load and avoid bans
diff --git a/workspace/skills/agent-browser/references/session-management.md b/workspace/skills/agent-browser/references/session-management.md
new file mode 100644
index 00000000..bb5312db
--- /dev/null
+++ b/workspace/skills/agent-browser/references/session-management.md
@@ -0,0 +1,193 @@
+# Session Management
+
+Multiple isolated browser sessions with state persistence and concurrent browsing.
+
+**Related**: [authentication.md](authentication.md) for login patterns, [SKILL.md](../SKILL.md) for quick start.
+
+## Contents
+
+- [Named Sessions](#named-sessions)
+- [Session Isolation Properties](#session-isolation-properties)
+- [Session State Persistence](#session-state-persistence)
+- [Common Patterns](#common-patterns)
+- [Default Session](#default-session)
+- [Session Cleanup](#session-cleanup)
+- [Best Practices](#best-practices)
+
+## Named Sessions
+
+Use `--session` flag to isolate browser contexts:
+
+```bash
+# Session 1: Authentication flow
+agent-browser --session auth open https://app.example.com/login
+
+# Session 2: Public browsing (separate cookies, storage)
+agent-browser --session public open https://example.com
+
+# Commands are isolated by session
+agent-browser --session auth fill @e1 "user@example.com"
+agent-browser --session public get text body
+```
+
+## Session Isolation Properties
+
+Each session has independent:
+- Cookies
+- LocalStorage / SessionStorage
+- IndexedDB
+- Cache
+- Browsing history
+- Open tabs
+
+## Session State Persistence
+
+### Save Session State
+
+```bash
+# Save cookies, storage, and auth state
+agent-browser state save /path/to/auth-state.json
+```
+
+### Load Session State
+
+```bash
+# Restore saved state
+agent-browser state load /path/to/auth-state.json
+
+# Continue with authenticated session
+agent-browser open https://app.example.com/dashboard
+```
+
+### State File Contents
+
+```json
+{
+ "cookies": [...],
+ "localStorage": {...},
+ "sessionStorage": {...},
+ "origins": [...]
+}
+```
+
+## Common Patterns
+
+### Authenticated Session Reuse
+
+```bash
+#!/bin/bash
+# Save login state once, reuse many times
+
+STATE_FILE="/tmp/auth-state.json"
+
+# Check if we have saved state
+if [[ -f "$STATE_FILE" ]]; then
+ agent-browser state load "$STATE_FILE"
+ agent-browser open https://app.example.com/dashboard
+else
+ # Perform login
+ agent-browser open https://app.example.com/login
+ agent-browser snapshot -i
+ agent-browser fill @e1 "$USERNAME"
+ agent-browser fill @e2 "$PASSWORD"
+ agent-browser click @e3
+ agent-browser wait --load networkidle
+
+ # Save for future use
+ agent-browser state save "$STATE_FILE"
+fi
+```
+
+### Concurrent Scraping
+
+```bash
+#!/bin/bash
+# Scrape multiple sites concurrently
+
+# Start all sessions
+agent-browser --session site1 open https://site1.com &
+agent-browser --session site2 open https://site2.com &
+agent-browser --session site3 open https://site3.com &
+wait
+
+# Extract from each
+agent-browser --session site1 get text body > site1.txt
+agent-browser --session site2 get text body > site2.txt
+agent-browser --session site3 get text body > site3.txt
+
+# Cleanup
+agent-browser --session site1 close
+agent-browser --session site2 close
+agent-browser --session site3 close
+```
+
+### A/B Testing Sessions
+
+```bash
+# Test different user experiences
+agent-browser --session variant-a open "https://app.com?variant=a"
+agent-browser --session variant-b open "https://app.com?variant=b"
+
+# Compare
+agent-browser --session variant-a screenshot /tmp/variant-a.png
+agent-browser --session variant-b screenshot /tmp/variant-b.png
+```
+
+## Default Session
+
+When `--session` is omitted, commands use the default session:
+
+```bash
+# These use the same default session
+agent-browser open https://example.com
+agent-browser snapshot -i
+agent-browser close # Closes default session
+```
+
+## Session Cleanup
+
+```bash
+# Close specific session
+agent-browser --session auth close
+
+# List active sessions
+agent-browser session list
+```
+
+## Best Practices
+
+### 1. Name Sessions Semantically
+
+```bash
+# GOOD: Clear purpose
+agent-browser --session github-auth open https://github.com
+agent-browser --session docs-scrape open https://docs.example.com
+
+# AVOID: Generic names
+agent-browser --session s1 open https://github.com
+```
+
+### 2. Always Clean Up
+
+```bash
+# Close sessions when done
+agent-browser --session auth close
+agent-browser --session scrape close
+```
+
+### 3. Handle State Files Securely
+
+```bash
+# Don't commit state files (contain auth tokens!)
+echo "*.auth-state.json" >> .gitignore
+
+# Delete after use
+rm /tmp/auth-state.json
+```
+
+### 4. Timeout Long Sessions
+
+```bash
+# Set timeout for automated scripts
+timeout 60 agent-browser --session long-task get text body
+```
diff --git a/workspace/skills/agent-browser/references/snapshot-refs.md b/workspace/skills/agent-browser/references/snapshot-refs.md
new file mode 100644
index 00000000..3cc0feaf
--- /dev/null
+++ b/workspace/skills/agent-browser/references/snapshot-refs.md
@@ -0,0 +1,219 @@
+# Snapshot and Refs
+
+Compact element references that reduce context usage dramatically for AI agents.
+
+**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
+
+## Contents
+
+- [How Refs Work](#how-refs-work)
+- [Snapshot Command](#the-snapshot-command)
+- [Using Refs](#using-refs)
+- [Ref Lifecycle](#ref-lifecycle)
+- [Best Practices](#best-practices)
+- [Ref Notation Details](#ref-notation-details)
+- [Troubleshooting](#troubleshooting)
+
+## How Refs Work
+
+Traditional approach:
+```
+Full DOM/HTML → AI parses → CSS selector → Action (~3000-5000 tokens)
+```
+
+agent-browser approach:
+```
+Compact snapshot → @refs assigned → Direct interaction (~200-400 tokens)
+```
+
+## The Snapshot Command
+
+```bash
+# Basic snapshot (shows page structure)
+agent-browser snapshot
+
+# Interactive snapshot (-i flag) - RECOMMENDED
+agent-browser snapshot -i
+```
+
+### Snapshot Output Format
+
+```
+Page: Example Site - Home
+URL: https://example.com
+
+@e1 [header]
+ @e2 [nav]
+ @e3 [a] "Home"
+ @e4 [a] "Products"
+ @e5 [a] "About"
+ @e6 [button] "Sign In"
+
+@e7 [main]
+ @e8 [h1] "Welcome"
+ @e9 [form]
+ @e10 [input type="email"] placeholder="Email"
+ @e11 [input type="password"] placeholder="Password"
+ @e12 [button type="submit"] "Log In"
+
+@e13 [footer]
+ @e14 [a] "Privacy Policy"
+```
+
+## Using Refs
+
+Once you have refs, interact directly:
+
+```bash
+# Click the "Sign In" button
+agent-browser click @e6
+
+# Fill email input
+agent-browser fill @e10 "user@example.com"
+
+# Fill password
+agent-browser fill @e11 "password123"
+
+# Submit the form
+agent-browser click @e12
+```
+
+## Ref Lifecycle
+
+**IMPORTANT**: Refs are invalidated when the page changes!
+
+```bash
+# Get initial snapshot
+agent-browser snapshot -i
+# @e1 [button] "Next"
+
+# Click triggers page change
+agent-browser click @e1
+
+# MUST re-snapshot to get new refs!
+agent-browser snapshot -i
+# @e1 [h1] "Page 2" ← Different element now!
+```
+
+## Best Practices
+
+### 1. Always Snapshot Before Interacting
+
+```bash
+# CORRECT
+agent-browser open https://example.com
+agent-browser snapshot -i # Get refs first
+agent-browser click @e1 # Use ref
+
+# WRONG
+agent-browser open https://example.com
+agent-browser click @e1 # Ref doesn't exist yet!
+```
+
+### 2. Re-Snapshot After Navigation
+
+```bash
+agent-browser click @e5 # Navigates to new page
+agent-browser snapshot -i # Get new refs
+agent-browser click @e1 # Use new refs
+```
+
+### 3. Re-Snapshot After Dynamic Changes
+
+```bash
+agent-browser click @e1 # Opens dropdown
+agent-browser snapshot -i # See dropdown items
+agent-browser click @e7 # Select item
+```
+
+### 4. Snapshot Specific Regions
+
+For complex pages, snapshot specific areas:
+
+```bash
+# Snapshot just the form
+agent-browser snapshot @e9
+```
+
+## Ref Notation Details
+
+```
+@e1 [tag type="value"] "text content" placeholder="hint"
+│ │ │ │ │
+│ │ │ │ └─ Additional attributes
+│ │ │ └─ Visible text
+│ │ └─ Key attributes shown
+│ └─ HTML tag name
+└─ Unique ref ID
+```
+
+### Common Patterns
+
+```
+@e1 [button] "Submit" # Button with text
+@e2 [input type="email"] # Email input
+@e3 [input type="password"] # Password input
+@e4 [a href="/page"] "Link Text" # Anchor link
+@e5 [select] # Dropdown
+@e6 [textarea] placeholder="Message" # Text area
+@e7 [div class="modal"] # Container (when relevant)
+@e8 [img alt="Logo"] # Image
+@e9 [checkbox] checked # Checked checkbox
+@e10 [radio] selected # Selected radio
+```
+
+## Iframes
+
+Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each `Iframe` node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like `click`, `fill`, and `type` work without manually switching frames.
+
+```bash
+agent-browser snapshot -i
+# @e1 [heading] "Checkout"
+# @e2 [Iframe] "payment-frame"
+# @e3 [input] "Card number"
+# @e4 [input] "Expiry"
+# @e5 [button] "Pay"
+# @e6 [button] "Cancel"
+
+# Interact with iframe elements directly using their refs
+agent-browser fill @e3 "4111111111111111"
+agent-browser fill @e4 "12/28"
+agent-browser click @e5
+```
+
+**Key details:**
+- Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
+- Cross-origin iframes that block accessibility tree access are silently skipped
+- Empty iframes or iframes with no interactive content are omitted from the output
+- To scope a snapshot to a single iframe, use `frame @ref` then `snapshot -i`
+
+## Troubleshooting
+
+### "Ref not found" Error
+
+```bash
+# Ref may have changed - re-snapshot
+agent-browser snapshot -i
+```
+
+### Element Not Visible in Snapshot
+
+```bash
+# Scroll down to reveal element
+agent-browser scroll down 1000
+agent-browser snapshot -i
+
+# Or wait for dynamic content
+agent-browser wait 1000
+agent-browser snapshot -i
+```
+
+### Too Many Elements
+
+```bash
+# Snapshot specific container
+agent-browser snapshot @e5
+
+# Or use get text for content-only extraction
+agent-browser get text @e5
+```
diff --git a/workspace/skills/agent-browser/references/video-recording.md b/workspace/skills/agent-browser/references/video-recording.md
new file mode 100644
index 00000000..e6a9fb4e
--- /dev/null
+++ b/workspace/skills/agent-browser/references/video-recording.md
@@ -0,0 +1,173 @@
+# Video Recording
+
+Capture browser automation as video for debugging, documentation, or verification.
+
+**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
+
+## Contents
+
+- [Basic Recording](#basic-recording)
+- [Recording Commands](#recording-commands)
+- [Use Cases](#use-cases)
+- [Best Practices](#best-practices)
+- [Output Format](#output-format)
+- [Limitations](#limitations)
+
+## Basic Recording
+
+```bash
+# Start recording
+agent-browser record start ./demo.webm
+
+# Perform actions
+agent-browser open https://example.com
+agent-browser snapshot -i
+agent-browser click @e1
+agent-browser fill @e2 "test input"
+
+# Stop and save
+agent-browser record stop
+```
+
+## Recording Commands
+
+```bash
+# Start recording to file
+agent-browser record start ./output.webm
+
+# Stop current recording
+agent-browser record stop
+
+# Restart with new file (stops current + starts new)
+agent-browser record restart ./take2.webm
+```
+
+## Use Cases
+
+### Debugging Failed Automation
+
+```bash
+#!/bin/bash
+# Record automation for debugging
+
+agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
+
+# Run your automation
+agent-browser open https://app.example.com
+agent-browser snapshot -i
+agent-browser click @e1 || {
+ echo "Click failed - check recording"
+ agent-browser record stop
+ exit 1
+}
+
+agent-browser record stop
+```
+
+### Documentation Generation
+
+```bash
+#!/bin/bash
+# Record workflow for documentation
+
+agent-browser record start ./docs/how-to-login.webm
+
+agent-browser open https://app.example.com/login
+agent-browser wait 1000 # Pause for visibility
+
+agent-browser snapshot -i
+agent-browser fill @e1 "demo@example.com"
+agent-browser wait 500
+
+agent-browser fill @e2 "password"
+agent-browser wait 500
+
+agent-browser click @e3
+agent-browser wait --load networkidle
+agent-browser wait 1000 # Show result
+
+agent-browser record stop
+```
+
+### CI/CD Test Evidence
+
+```bash
+#!/bin/bash
+# Record E2E test runs for CI artifacts
+
+TEST_NAME="${1:-e2e-test}"
+RECORDING_DIR="./test-recordings"
+mkdir -p "$RECORDING_DIR"
+
+agent-browser record start "$RECORDING_DIR/$TEST_NAME-$(date +%s).webm"
+
+# Run test
+if run_e2e_test; then
+ echo "Test passed"
+else
+ echo "Test failed - recording saved"
+fi
+
+agent-browser record stop
+```
+
+## Best Practices
+
+### 1. Add Pauses for Clarity
+
+```bash
+# Slow down for human viewing
+agent-browser click @e1
+agent-browser wait 500 # Let viewer see result
+```
+
+### 2. Use Descriptive Filenames
+
+```bash
+# Include context in filename
+agent-browser record start ./recordings/login-flow-2024-01-15.webm
+agent-browser record start ./recordings/checkout-test-run-42.webm
+```
+
+### 3. Handle Recording in Error Cases
+
+```bash
+#!/bin/bash
+set -e
+
+cleanup() {
+ agent-browser record stop 2>/dev/null || true
+ agent-browser close 2>/dev/null || true
+}
+trap cleanup EXIT
+
+agent-browser record start ./automation.webm
+# ... automation steps ...
+```
+
+### 4. Combine with Screenshots
+
+```bash
+# Record video AND capture key frames
+agent-browser record start ./flow.webm
+
+agent-browser open https://example.com
+agent-browser screenshot ./screenshots/step1-homepage.png
+
+agent-browser click @e1
+agent-browser screenshot ./screenshots/step2-after-click.png
+
+agent-browser record stop
+```
+
+## Output Format
+
+- Default format: WebM (VP8/VP9 codec)
+- Compatible with all modern browsers and video players
+- Compressed but high quality
+
+## Limitations
+
+- Recording adds slight overhead to automation
+- Large recordings can consume significant disk space
+- Some headless environments may have codec limitations
diff --git a/workspace/skills/agent-browser/templates/authenticated-session.sh b/workspace/skills/agent-browser/templates/authenticated-session.sh
new file mode 100644
index 00000000..b66c9289
--- /dev/null
+++ b/workspace/skills/agent-browser/templates/authenticated-session.sh
@@ -0,0 +1,105 @@
+#!/bin/bash
+# Template: Authenticated Session Workflow
+# Purpose: Login once, save state, reuse for subsequent runs
+# Usage: ./authenticated-session.sh [state-file]
+#
+# RECOMMENDED: Use the auth vault instead of this template:
+# echo "" | agent-browser auth save myapp --url --username --password-stdin
+# agent-browser auth login myapp
+# The auth vault stores credentials securely and the LLM never sees passwords.
+#
+# Environment variables:
+# APP_USERNAME - Login username/email
+# APP_PASSWORD - Login password
+#
+# Two modes:
+# 1. Discovery mode (default): Shows form structure so you can identify refs
+# 2. Login mode: Performs actual login after you update the refs
+#
+# Setup steps:
+# 1. Run once to see form structure (discovery mode)
+# 2. Update refs in LOGIN FLOW section below
+# 3. Set APP_USERNAME and APP_PASSWORD
+# 4. Delete the DISCOVERY section
+
+set -euo pipefail
+
+LOGIN_URL="${1:?Usage: $0 [state-file]}"
+STATE_FILE="${2:-./auth-state.json}"
+
+echo "Authentication workflow: $LOGIN_URL"
+
+# ================================================================
+# SAVED STATE: Skip login if valid saved state exists
+# ================================================================
+if [[ -f "$STATE_FILE" ]]; then
+ echo "Loading saved state from $STATE_FILE..."
+ if agent-browser --state "$STATE_FILE" open "$LOGIN_URL" 2>/dev/null; then
+ agent-browser wait --load networkidle
+
+ CURRENT_URL=$(agent-browser get url)
+ if [[ "$CURRENT_URL" != *"login"* ]] && [[ "$CURRENT_URL" != *"signin"* ]]; then
+ echo "Session restored successfully"
+ agent-browser snapshot -i
+ exit 0
+ fi
+ echo "Session expired, performing fresh login..."
+ agent-browser close 2>/dev/null || true
+ else
+ echo "Failed to load state, re-authenticating..."
+ fi
+ rm -f "$STATE_FILE"
+fi
+
+# ================================================================
+# DISCOVERY MODE: Shows form structure (delete after setup)
+# ================================================================
+echo "Opening login page..."
+agent-browser open "$LOGIN_URL"
+agent-browser wait --load networkidle
+
+echo ""
+echo "Login form structure:"
+echo "---"
+agent-browser snapshot -i
+echo "---"
+echo ""
+echo "Next steps:"
+echo " 1. Note the refs: username=@e?, password=@e?, submit=@e?"
+echo " 2. Update the LOGIN FLOW section below with your refs"
+echo " 3. Set: export APP_USERNAME='...' APP_PASSWORD='...'"
+echo " 4. Delete this DISCOVERY MODE section"
+echo ""
+agent-browser close
+exit 0
+
+# ================================================================
+# LOGIN FLOW: Uncomment and customize after discovery
+# ================================================================
+# : "${APP_USERNAME:?Set APP_USERNAME environment variable}"
+# : "${APP_PASSWORD:?Set APP_PASSWORD environment variable}"
+#
+# agent-browser open "$LOGIN_URL"
+# agent-browser wait --load networkidle
+# agent-browser snapshot -i
+#
+# # Fill credentials (update refs to match your form)
+# agent-browser fill @e1 "$APP_USERNAME"
+# agent-browser fill @e2 "$APP_PASSWORD"
+# agent-browser click @e3
+# agent-browser wait --load networkidle
+#
+# # Verify login succeeded
+# FINAL_URL=$(agent-browser get url)
+# if [[ "$FINAL_URL" == *"login"* ]] || [[ "$FINAL_URL" == *"signin"* ]]; then
+# echo "Login failed - still on login page"
+# agent-browser screenshot /tmp/login-failed.png
+# agent-browser close
+# exit 1
+# fi
+#
+# # Save state for future runs
+# echo "Saving state to $STATE_FILE"
+# agent-browser state save "$STATE_FILE"
+# echo "Login successful"
+# agent-browser snapshot -i
diff --git a/workspace/skills/agent-browser/templates/capture-workflow.sh b/workspace/skills/agent-browser/templates/capture-workflow.sh
new file mode 100644
index 00000000..3bc93ad0
--- /dev/null
+++ b/workspace/skills/agent-browser/templates/capture-workflow.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+# Template: Content Capture Workflow
+# Purpose: Extract content from web pages (text, screenshots, PDF)
+# Usage: ./capture-workflow.sh [output-dir]
+#
+# Outputs:
+# - page-full.png: Full page screenshot
+# - page-structure.txt: Page element structure with refs
+# - page-text.txt: All text content
+# - page.pdf: PDF version
+#
+# Optional: Load auth state for protected pages
+
+set -euo pipefail
+
+TARGET_URL="${1:?Usage: $0 [output-dir]}"
+OUTPUT_DIR="${2:-.}"
+
+echo "Capturing: $TARGET_URL"
+mkdir -p "$OUTPUT_DIR"
+
+# Optional: Load authentication state
+# if [[ -f "./auth-state.json" ]]; then
+# echo "Loading authentication state..."
+# agent-browser state load "./auth-state.json"
+# fi
+
+# Navigate to target
+agent-browser open "$TARGET_URL"
+agent-browser wait --load networkidle
+
+# Get metadata
+TITLE=$(agent-browser get title)
+URL=$(agent-browser get url)
+echo "Title: $TITLE"
+echo "URL: $URL"
+
+# Capture full page screenshot
+agent-browser screenshot --full "$OUTPUT_DIR/page-full.png"
+echo "Saved: $OUTPUT_DIR/page-full.png"
+
+# Get page structure with refs
+agent-browser snapshot -i > "$OUTPUT_DIR/page-structure.txt"
+echo "Saved: $OUTPUT_DIR/page-structure.txt"
+
+# Extract all text content
+agent-browser get text body > "$OUTPUT_DIR/page-text.txt"
+echo "Saved: $OUTPUT_DIR/page-text.txt"
+
+# Save as PDF
+agent-browser pdf "$OUTPUT_DIR/page.pdf"
+echo "Saved: $OUTPUT_DIR/page.pdf"
+
+# Optional: Extract specific elements using refs from structure
+# agent-browser get text @e5 > "$OUTPUT_DIR/main-content.txt"
+
+# Optional: Handle infinite scroll pages
+# for i in {1..5}; do
+# agent-browser scroll down 1000
+# agent-browser wait 1000
+# done
+# agent-browser screenshot --full "$OUTPUT_DIR/page-scrolled.png"
+
+# Cleanup
+agent-browser close
+
+echo ""
+echo "Capture complete:"
+ls -la "$OUTPUT_DIR"
diff --git a/workspace/skills/agent-browser/templates/form-automation.sh b/workspace/skills/agent-browser/templates/form-automation.sh
new file mode 100644
index 00000000..6784fcd3
--- /dev/null
+++ b/workspace/skills/agent-browser/templates/form-automation.sh
@@ -0,0 +1,62 @@
+#!/bin/bash
+# Template: Form Automation Workflow
+# Purpose: Fill and submit web forms with validation
+# Usage: ./form-automation.sh
+#
+# This template demonstrates the snapshot-interact-verify pattern:
+# 1. Navigate to form
+# 2. Snapshot to get element refs
+# 3. Fill fields using refs
+# 4. Submit and verify result
+#
+# Customize: Update the refs (@e1, @e2, etc.) based on your form's snapshot output
+
+set -euo pipefail
+
+FORM_URL="${1:?Usage: $0 }"
+
+echo "Form automation: $FORM_URL"
+
+# Step 1: Navigate to form
+agent-browser open "$FORM_URL"
+agent-browser wait --load networkidle
+
+# Step 2: Snapshot to discover form elements
+echo ""
+echo "Form structure:"
+agent-browser snapshot -i
+
+# Step 3: Fill form fields (customize these refs based on snapshot output)
+#
+# Common field types:
+# agent-browser fill @e1 "John Doe" # Text input
+# agent-browser fill @e2 "user@example.com" # Email input
+# agent-browser fill @e3 "SecureP@ss123" # Password input
+# agent-browser select @e4 "Option Value" # Dropdown
+# agent-browser check @e5 # Checkbox
+# agent-browser click @e6 # Radio button
+# agent-browser fill @e7 "Multi-line text" # Textarea
+# agent-browser upload @e8 /path/to/file.pdf # File upload
+#
+# Uncomment and modify:
+# agent-browser fill @e1 "Test User"
+# agent-browser fill @e2 "test@example.com"
+# agent-browser click @e3 # Submit button
+
+# Step 4: Wait for submission
+# agent-browser wait --load networkidle
+# agent-browser wait --url "**/success" # Or wait for redirect
+
+# Step 5: Verify result
+echo ""
+echo "Result:"
+agent-browser get url
+agent-browser snapshot -i
+
+# Optional: Capture evidence
+agent-browser screenshot /tmp/form-result.png
+echo "Screenshot saved: /tmp/form-result.png"
+
+# Cleanup
+agent-browser close
+echo "Done"
diff --git a/workspace/skills/clawhub/SKILL.md b/workspace/skills/clawhub/SKILL.md
new file mode 100644
index 00000000..b28a1de0
--- /dev/null
+++ b/workspace/skills/clawhub/SKILL.md
@@ -0,0 +1,53 @@
+---
+name: clawhub
+description: Search and install agent skills from ClawHub, the public skill registry.
+homepage: https://clawhub.ai
+metadata: {"internclaw":{"emoji":"🦞"}}
+---
+
+# ClawHub
+
+Public skill registry for AI agents. Search by natural language (vector search).
+
+## When to use
+
+Use this skill when the user asks any of:
+- "find a skill for …"
+- "search for skills"
+- "install a skill"
+- "what skills are available?"
+- "update my skills"
+
+## Search
+
+```bash
+npx --yes clawhub@latest search "web scraping" --limit 5
+```
+
+## Install
+
+```bash
+npx --yes clawhub@latest install --workdir ~/.internclaw/workspace
+```
+
+Replace `` with the skill name from search results. This places the skill into `~/.internclaw/workspace/skills/`, where internclaw loads workspace skills from. Always include `--workdir`.
+
+## Update
+
+```bash
+npx --yes clawhub@latest update --all --workdir ~/.internclaw/workspace
+```
+
+## List installed
+
+```bash
+npx --yes clawhub@latest list --workdir ~/.internclaw/workspace
+```
+
+## Notes
+
+- Requires Node.js (`npx` comes with it).
+- No API key needed for search and install.
+- Login (`npx --yes clawhub@latest login`) is only required for publishing.
+- `--workdir ~/.internclaw/workspace` is critical — without it, skills install to the current directory instead of the internclaw workspace.
+- After install, remind the user to start a new session to load the skill.
diff --git a/workspace/skills/cron/SKILL.md b/workspace/skills/cron/SKILL.md
new file mode 100644
index 00000000..48eaa805
--- /dev/null
+++ b/workspace/skills/cron/SKILL.md
@@ -0,0 +1,57 @@
+---
+name: cron
+description: Schedule reminders and recurring tasks.
+---
+
+# Cron
+
+Use the `cron` tool to schedule reminders or recurring tasks.
+
+## Three Modes
+
+1. **Reminder** - message is sent directly to user
+2. **Task** - message is a task description, agent executes and sends result
+3. **One-time** - runs once at a specific time, then auto-deletes
+
+## Examples
+
+Fixed reminder:
+```
+cron(action="add", message="Time to take a break!", every_seconds=1200)
+```
+
+Dynamic task (agent executes each time):
+```
+cron(action="add", message="Check HKUDS/internclaw GitHub stars and report", every_seconds=600)
+```
+
+One-time scheduled task (compute ISO datetime from current time):
+```
+cron(action="add", message="Remind me about the meeting", at="")
+```
+
+Timezone-aware cron:
+```
+cron(action="add", message="Morning standup", cron_expr="0 9 * * 1-5", tz="America/Vancouver")
+```
+
+List/remove:
+```
+cron(action="list")
+cron(action="remove", job_id="abc123")
+```
+
+## Time Expressions
+
+| User says | Parameters |
+|-----------|------------|
+| every 20 minutes | every_seconds: 1200 |
+| every hour | every_seconds: 3600 |
+| every day at 8am | cron_expr: "0 8 * * *" |
+| weekdays at 5pm | cron_expr: "0 17 * * 1-5" |
+| 9am Vancouver time daily | cron_expr: "0 9 * * *", tz: "America/Vancouver" |
+| at a specific time | at: ISO datetime string (compute from current time) |
+
+## Timezone
+
+Use `tz` with `cron_expr` to schedule in a specific IANA timezone. Without `tz`, the server's local timezone is used.
diff --git a/workspace/skills/github/SKILL.md b/workspace/skills/github/SKILL.md
new file mode 100644
index 00000000..6fa00ecb
--- /dev/null
+++ b/workspace/skills/github/SKILL.md
@@ -0,0 +1,48 @@
+---
+name: github
+description: "Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries."
+metadata: {"internclaw":{"emoji":"🐙","requires":{"bins":["gh"]},"install":[{"id":"brew","kind":"brew","formula":"gh","bins":["gh"],"label":"Install GitHub CLI (brew)"},{"id":"apt","kind":"apt","package":"gh","bins":["gh"],"label":"Install GitHub CLI (apt)"}]}}
+---
+
+# GitHub Skill
+
+Use the `gh` CLI to interact with GitHub. Always specify `--repo owner/repo` when not in a git directory, or use URLs directly.
+
+## Pull Requests
+
+Check CI status on a PR:
+```bash
+gh pr checks 55 --repo owner/repo
+```
+
+List recent workflow runs:
+```bash
+gh run list --repo owner/repo --limit 10
+```
+
+View a run and see which steps failed:
+```bash
+gh run view --repo owner/repo
+```
+
+View logs for failed steps only:
+```bash
+gh run view --repo owner/repo --log-failed
+```
+
+## API for Advanced Queries
+
+The `gh api` command is useful for accessing data not available through other subcommands.
+
+Get PR with specific fields:
+```bash
+gh api repos/owner/repo/pulls/55 --jq '.title, .state, .user.login'
+```
+
+## JSON Output
+
+Most commands support `--json` for structured output. You can use `--jq` to filter:
+
+```bash
+gh issue list --repo owner/repo --json number,title --jq '.[] | "\(.number): \(.title)"'
+```
diff --git a/workspace/skills/memory/SKILL.md b/workspace/skills/memory/SKILL.md
new file mode 100644
index 00000000..529a02d5
--- /dev/null
+++ b/workspace/skills/memory/SKILL.md
@@ -0,0 +1,31 @@
+---
+name: memory
+description: Two-layer memory system with grep-based recall.
+always: true
+---
+
+# Memory
+
+## Structure
+
+- `memory/MEMORY.md` — Long-term facts (preferences, project context, relationships). Always loaded into your context.
+- `memory/HISTORY.md` — Append-only event log. NOT loaded into context. Search it with grep. Each entry starts with [YYYY-MM-DD HH:MM].
+
+## Search Past Events
+
+```bash
+grep -i "keyword" memory/HISTORY.md
+```
+
+Use the `exec` tool to run grep. Combine patterns: `grep -iE "meeting|deadline" memory/HISTORY.md`
+
+## When to Update MEMORY.md
+
+Write important facts immediately using `edit_file` or `write_file`:
+- User preferences ("I prefer dark mode")
+- Project context ("The API uses OAuth2")
+- Relationships ("Alice is the project lead")
+
+## Auto-consolidation
+
+Old conversations are automatically summarized and appended to HISTORY.md when the session grows large. Long-term facts are extracted to MEMORY.md. You don't need to manage this.
diff --git a/workspace/skills/skill-creator/SKILL.md b/workspace/skills/skill-creator/SKILL.md
new file mode 100644
index 00000000..9b5eb6fe
--- /dev/null
+++ b/workspace/skills/skill-creator/SKILL.md
@@ -0,0 +1,371 @@
+---
+name: skill-creator
+description: Create or update AgentSkills. Use when designing, structuring, or packaging skills with scripts, references, and assets.
+---
+
+# Skill Creator
+
+This skill provides guidance for creating effective skills.
+
+## About Skills
+
+Skills are modular, self-contained packages that extend the agent's capabilities by providing
+specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific
+domains or tasks—they transform the agent from a general-purpose agent into a specialized agent
+equipped with procedural knowledge that no model can fully possess.
+
+### What Skills Provide
+
+1. Specialized workflows - Multi-step procedures for specific domains
+2. Tool integrations - Instructions for working with specific file formats or APIs
+3. Domain expertise - Company-specific knowledge, schemas, business logic
+4. Bundled resources - Scripts, references, and assets for complex and repetitive tasks
+
+## Core Principles
+
+### Concise is Key
+
+The context window is a public good. Skills share the context window with everything else the agent needs: system prompt, conversation history, other Skills' metadata, and the actual user request.
+
+**Default assumption: the agent is already very smart.** Only add context the agent doesn't already have. Challenge each piece of information: "Does the agent really need this explanation?" and "Does this paragraph justify its token cost?"
+
+Prefer concise examples over verbose explanations.
+
+### Set Appropriate Degrees of Freedom
+
+Match the level of specificity to the task's fragility and variability:
+
+**High freedom (text-based instructions)**: Use when multiple approaches are valid, decisions depend on context, or heuristics guide the approach.
+
+**Medium freedom (pseudocode or scripts with parameters)**: Use when a preferred pattern exists, some variation is acceptable, or configuration affects behavior.
+
+**Low freedom (specific scripts, few parameters)**: Use when operations are fragile and error-prone, consistency is critical, or a specific sequence must be followed.
+
+Think of the agent as exploring a path: a narrow bridge with cliffs needs specific guardrails (low freedom), while an open field allows many routes (high freedom).
+
+### Anatomy of a Skill
+
+Every skill consists of a required SKILL.md file and optional bundled resources:
+
+```
+skill-name/
+├── SKILL.md (required)
+│ ├── YAML frontmatter metadata (required)
+│ │ ├── name: (required)
+│ │ └── description: (required)
+│ └── Markdown instructions (required)
+└── Bundled Resources (optional)
+ ├── scripts/ - Executable code (Python/Bash/etc.)
+ ├── references/ - Documentation intended to be loaded into context as needed
+ └── assets/ - Files used in output (templates, icons, fonts, etc.)
+```
+
+#### SKILL.md (required)
+
+Every SKILL.md consists of:
+
+- **Frontmatter** (YAML): Contains `name` and `description` fields. These are the only fields that the agent reads to determine when the skill gets used, thus it is very important to be clear and comprehensive in describing what the skill is, and when it should be used.
+- **Body** (Markdown): Instructions and guidance for using the skill. Only loaded AFTER the skill triggers (if at all).
+
+#### Bundled Resources (optional)
+
+##### Scripts (`scripts/`)
+
+Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
+
+- **When to include**: When the same code is being rewritten repeatedly or deterministic reliability is needed
+- **Example**: `scripts/rotate_pdf.py` for PDF rotation tasks
+- **Benefits**: Token efficient, deterministic, may be executed without loading into context
+- **Note**: Scripts may still need to be read by the agent for patching or environment-specific adjustments
+
+##### References (`references/`)
+
+Documentation and reference material intended to be loaded as needed into context to inform the agent's process and thinking.
+
+- **When to include**: For documentation that the agent should reference while working
+- **Examples**: `references/finance.md` for financial schemas, `references/mnda.md` for company NDA template, `references/policies.md` for company policies, `references/api_docs.md` for API specifications
+- **Use cases**: Database schemas, API documentation, domain knowledge, company policies, detailed workflow guides
+- **Benefits**: Keeps SKILL.md lean, loaded only when the agent determines it's needed
+- **Best practice**: If files are large (>10k words), include grep search patterns in SKILL.md
+- **Avoid duplication**: Information should live in either SKILL.md or references files, not both. Prefer references files for detailed information unless it's truly core to the skill—this keeps SKILL.md lean while making information discoverable without hogging the context window. Keep only essential procedural instructions and workflow guidance in SKILL.md; move detailed reference material, schemas, and examples to references files.
+
+##### Assets (`assets/`)
+
+Files not intended to be loaded into context, but rather used within the output the agent produces.
+
+- **When to include**: When the skill needs files that will be used in the final output
+- **Examples**: `assets/logo.png` for brand assets, `assets/slides.pptx` for PowerPoint templates, `assets/frontend-template/` for HTML/React boilerplate, `assets/font.ttf` for typography
+- **Use cases**: Templates, images, icons, boilerplate code, fonts, sample documents that get copied or modified
+- **Benefits**: Separates output resources from documentation, enables the agent to use files without loading them into context
+
+#### What to Not Include in a Skill
+
+A skill should only contain essential files that directly support its functionality. Do NOT create extraneous documentation or auxiliary files, including:
+
+- README.md
+- INSTALLATION_GUIDE.md
+- QUICK_REFERENCE.md
+- CHANGELOG.md
+- etc.
+
+The skill should only contain the information needed for an AI agent to do the job at hand. It should not contain auxiliary context about the process that went into creating it, setup and testing procedures, user-facing documentation, etc. Creating additional documentation files just adds clutter and confusion.
+
+### Progressive Disclosure Design Principle
+
+Skills use a three-level loading system to manage context efficiently:
+
+1. **Metadata (name + description)** - Always in context (~100 words)
+2. **SKILL.md body** - When skill triggers (<5k words)
+3. **Bundled resources** - As needed by the agent (Unlimited because scripts can be executed without reading into context window)
+
+#### Progressive Disclosure Patterns
+
+Keep SKILL.md body to the essentials and under 500 lines to minimize context bloat. Split content into separate files when approaching this limit. When splitting out content into other files, it is very important to reference them from SKILL.md and describe clearly when to read them, to ensure the reader of the skill knows they exist and when to use them.
+
+**Key principle:** When a skill supports multiple variations, frameworks, or options, keep only the core workflow and selection guidance in SKILL.md. Move variant-specific details (patterns, examples, configuration) into separate reference files.
+
+**Pattern 1: High-level guide with references**
+
+```markdown
+# PDF Processing
+
+## Quick start
+
+Extract text with pdfplumber:
+[code example]
+
+## Advanced features
+
+- **Form filling**: See [FORMS.md](FORMS.md) for complete guide
+- **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
+- **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
+```
+
+the agent loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
+
+**Pattern 2: Domain-specific organization**
+
+For Skills with multiple domains, organize content by domain to avoid loading irrelevant context:
+
+```
+bigquery-skill/
+├── SKILL.md (overview and navigation)
+└── reference/
+ ├── finance.md (revenue, billing metrics)
+ ├── sales.md (opportunities, pipeline)
+ ├── product.md (API usage, features)
+ └── marketing.md (campaigns, attribution)
+```
+
+When a user asks about sales metrics, the agent only reads sales.md.
+
+Similarly, for skills supporting multiple frameworks or variants, organize by variant:
+
+```
+cloud-deploy/
+├── SKILL.md (workflow + provider selection)
+└── references/
+ ├── aws.md (AWS deployment patterns)
+ ├── gcp.md (GCP deployment patterns)
+ └── azure.md (Azure deployment patterns)
+```
+
+When the user chooses AWS, the agent only reads aws.md.
+
+**Pattern 3: Conditional details**
+
+Show basic content, link to advanced content:
+
+```markdown
+# DOCX Processing
+
+## Creating documents
+
+Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
+
+## Editing documents
+
+For simple edits, modify the XML directly.
+
+**For tracked changes**: See [REDLINING.md](REDLINING.md)
+**For OOXML details**: See [OOXML.md](OOXML.md)
+```
+
+the agent reads REDLINING.md or OOXML.md only when the user needs those features.
+
+**Important guidelines:**
+
+- **Avoid deeply nested references** - Keep references one level deep from SKILL.md. All reference files should link directly from SKILL.md.
+- **Structure longer reference files** - For files longer than 100 lines, include a table of contents at the top so the agent can see the full scope when previewing.
+
+## Skill Creation Process
+
+Skill creation involves these steps:
+
+1. Understand the skill with concrete examples
+2. Plan reusable skill contents (scripts, references, assets)
+3. Initialize the skill (run init_skill.py)
+4. Edit the skill (implement resources and write SKILL.md)
+5. Package the skill (run package_skill.py)
+6. Iterate based on real usage
+
+Follow these steps in order, skipping only if there is a clear reason why they are not applicable.
+
+### Skill Naming
+
+- Use lowercase letters, digits, and hyphens only; normalize user-provided titles to hyphen-case (e.g., "Plan Mode" -> `plan-mode`).
+- When generating names, generate a name under 64 characters (letters, digits, hyphens).
+- Prefer short, verb-led phrases that describe the action.
+- Namespace by tool when it improves clarity or triggering (e.g., `gh-address-comments`, `linear-address-issue`).
+- Name the skill folder exactly after the skill name.
+
+### Step 1: Understanding the Skill with Concrete Examples
+
+Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
+
+To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
+
+For example, when building an image-editor skill, relevant questions include:
+
+- "What functionality should the image-editor skill support? Editing, rotating, anything else?"
+- "Can you give some examples of how this skill would be used?"
+- "I can imagine users asking for things like 'Remove the red-eye from this image' or 'Rotate this image'. Are there other ways you imagine this skill being used?"
+- "What would a user say that should trigger this skill?"
+
+To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
+
+Conclude this step when there is a clear sense of the functionality the skill should support.
+
+### Step 2: Planning the Reusable Skill Contents
+
+To turn concrete examples into an effective skill, analyze each example by:
+
+1. Considering how to execute on the example from scratch
+2. Identifying what scripts, references, and assets would be helpful when executing these workflows repeatedly
+
+Example: When building a `pdf-editor` skill to handle queries like "Help me rotate this PDF," the analysis shows:
+
+1. Rotating a PDF requires re-writing the same code each time
+2. A `scripts/rotate_pdf.py` script would be helpful to store in the skill
+
+Example: When designing a `frontend-webapp-builder` skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
+
+1. Writing a frontend webapp requires the same boilerplate HTML/React each time
+2. An `assets/hello-world/` template containing the boilerplate HTML/React project files would be helpful to store in the skill
+
+Example: When building a `big-query` skill to handle queries like "How many users have logged in today?" the analysis shows:
+
+1. Querying BigQuery requires re-discovering the table schemas and relationships each time
+2. A `references/schema.md` file documenting the table schemas would be helpful to store in the skill
+
+To establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
+
+### Step 3: Initializing the Skill
+
+At this point, it is time to actually create the skill.
+
+Skip this step only if the skill being developed already exists, and iteration or packaging is needed. In this case, continue to the next step.
+
+When creating a new skill from scratch, always run the `init_skill.py` script. The script conveniently generates a new template skill directory that automatically includes everything a skill requires, making the skill creation process much more efficient and reliable.
+
+Usage:
+
+```bash
+scripts/init_skill.py --path [--resources scripts,references,assets] [--examples]
+```
+
+Examples:
+
+```bash
+scripts/init_skill.py my-skill --path skills/public
+scripts/init_skill.py my-skill --path skills/public --resources scripts,references
+scripts/init_skill.py my-skill --path skills/public --resources scripts --examples
+```
+
+The script:
+
+- Creates the skill directory at the specified path
+- Generates a SKILL.md template with proper frontmatter and TODO placeholders
+- Optionally creates resource directories based on `--resources`
+- Optionally adds example files when `--examples` is set
+
+After initialization, customize the SKILL.md and add resources as needed. If you used `--examples`, replace or delete placeholder files.
+
+### Step 4: Edit the Skill
+
+When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of the agent to use. Include information that would be beneficial and non-obvious to the agent. Consider what procedural knowledge, domain-specific details, or reusable assets would help another the agent instance execute these tasks more effectively.
+
+#### Learn Proven Design Patterns
+
+Consult these helpful guides based on your skill's needs:
+
+- **Multi-step processes**: See references/workflows.md for sequential workflows and conditional logic
+- **Specific output formats or quality standards**: See references/output-patterns.md for template and example patterns
+
+These files contain established best practices for effective skill design.
+
+#### Start with Reusable Skill Contents
+
+To begin implementation, start with the reusable resources identified above: `scripts/`, `references/`, and `assets/` files. Note that this step may require user input. For example, when implementing a `brand-guidelines` skill, the user may need to provide brand assets or templates to store in `assets/`, or documentation to store in `references/`.
+
+Added scripts must be tested by actually running them to ensure there are no bugs and that the output matches what is expected. If there are many similar scripts, only a representative sample needs to be tested to ensure confidence that they all work while balancing time to completion.
+
+If you used `--examples`, delete any placeholder files that are not needed for the skill. Only create resource directories that are actually required.
+
+#### Update SKILL.md
+
+**Writing Guidelines:** Always use imperative/infinitive form.
+
+##### Frontmatter
+
+Write the YAML frontmatter with `name` and `description`:
+
+- `name`: The skill name
+- `description`: This is the primary triggering mechanism for your skill, and helps the agent understand when to use the skill.
+ - Include both what the Skill does and specific triggers/contexts for when to use it.
+ - Include all "when to use" information here - Not in the body. The body is only loaded after triggering, so "When to Use This Skill" sections in the body are not helpful to the agent.
+ - Example description for a `docx` skill: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when the agent needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
+
+Do not include any other fields in YAML frontmatter.
+
+##### Body
+
+Write instructions for using the skill and its bundled resources.
+
+### Step 5: Packaging a Skill
+
+Once development of the skill is complete, it must be packaged into a distributable .skill file that gets shared with the user. The packaging process automatically validates the skill first to ensure it meets all requirements:
+
+```bash
+scripts/package_skill.py
+```
+
+Optional output directory specification:
+
+```bash
+scripts/package_skill.py ./dist
+```
+
+The packaging script will:
+
+1. **Validate** the skill automatically, checking:
+
+ - YAML frontmatter format and required fields
+ - Skill naming conventions and directory structure
+ - Description completeness and quality
+ - File organization and resource references
+
+2. **Package** the skill if validation passes, creating a .skill file named after the skill (e.g., `my-skill.skill`) that includes all files and maintains the proper directory structure for distribution. The .skill file is a zip file with a .skill extension.
+
+If validation fails, the script will report the errors and exit without creating a package. Fix any validation errors and run the packaging command again.
+
+### Step 6: Iterate
+
+After testing the skill, users may request improvements. Often this happens right after using the skill, with fresh context of how the skill performed.
+
+**Iteration workflow:**
+
+1. Use the skill on real tasks
+2. Notice struggles or inefficiencies
+3. Identify how SKILL.md or bundled resources should be updated
+4. Implement changes and test again
diff --git a/workspace/skills/summarize/SKILL.md b/workspace/skills/summarize/SKILL.md
new file mode 100644
index 00000000..01805765
--- /dev/null
+++ b/workspace/skills/summarize/SKILL.md
@@ -0,0 +1,67 @@
+---
+name: summarize
+description: Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
+homepage: https://summarize.sh
+metadata: {"internclaw":{"emoji":"🧾","requires":{"bins":["summarize"]},"install":[{"id":"brew","kind":"brew","formula":"steipete/tap/summarize","bins":["summarize"],"label":"Install summarize (brew)"}]}}
+---
+
+# Summarize
+
+Fast CLI to summarize URLs, local files, and YouTube links.
+
+## When to use (trigger phrases)
+
+Use this skill immediately when the user asks any of:
+- “use summarize.sh”
+- “what’s this link/video about?”
+- “summarize this URL/article”
+- “transcribe this YouTube/video” (best-effort transcript extraction; no `yt-dlp` needed)
+
+## Quick start
+
+```bash
+summarize "https://example.com" --model google/gemini-3-flash-preview
+summarize "/path/to/file.pdf" --model google/gemini-3-flash-preview
+summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto
+```
+
+## YouTube: summary vs transcript
+
+Best-effort transcript (URLs only):
+
+```bash
+summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto --extract-only
+```
+
+If the user asked for a transcript but it’s huge, return a tight summary first, then ask which section/time range to expand.
+
+## Model + keys
+
+Set the API key for your chosen provider:
+- OpenAI: `OPENAI_API_KEY`
+- Anthropic: `ANTHROPIC_API_KEY`
+- xAI: `XAI_API_KEY`
+- Google: `GEMINI_API_KEY` (aliases: `GOOGLE_GENERATIVE_AI_API_KEY`, `GOOGLE_API_KEY`)
+
+Default model is `google/gemini-3-flash-preview` if none is set.
+
+## Useful flags
+
+- `--length short|medium|long|xl|xxl|`
+- `--max-output-tokens `
+- `--extract-only` (URLs only)
+- `--json` (machine readable)
+- `--firecrawl auto|off|always` (fallback extraction)
+- `--youtube auto` (Apify fallback if `APIFY_API_TOKEN` set)
+
+## Config
+
+Optional config file: `~/.summarize/config.json`
+
+```json
+{ "model": "openai/gpt-5.2" }
+```
+
+Optional services:
+- `FIRECRAWL_API_KEY` for blocked sites
+- `APIFY_API_TOKEN` for YouTube fallback
diff --git a/workspace/skills/task-synthesis/SKILL.md b/workspace/skills/task-synthesis/SKILL.md
new file mode 100644
index 00000000..83f1098d
--- /dev/null
+++ b/workspace/skills/task-synthesis/SKILL.md
@@ -0,0 +1,229 @@
+---
+name: task-synthesis
+description: Synthesize complete agent evaluation task packages from an OpenClaw skill. Use when given a skill (SKILL.md path, skill name, or skill content) and asked to generate a benchmark task — including task.md (with frontmatter, Prompt, Expected Behavior, Assets Preparation, Toolset Preparation, Environment Specification, Grading Criteria, Automated Checks, LLM Judge Rubric), an assets/ folder, a tools/ folder with OpenClaw-loadable tool definitions, and a scripts/ folder containing env_spec.sh and any helper scripts. Triggers on phrases like "generate a task from this skill", "synthesize a task for skill X", "create a benchmark task", "draft a task spec".
+---
+
+# Task Synthesis
+
+## Overview
+
+This skill produces a complete, ready-to-run evaluation task package derived from an existing OpenClaw skill. The input is a skill (SKILL.md, skill name, or a path to the skill); the output is a self-contained task directory:
+
+| Output | Description |
+|---|---|
+| `task.md` | Task definition following the standard template |
+| `assets/` | All input files and directories the agent needs at runtime |
+| `tools/` | OpenClaw-loadable tool definitions the agent can call |
+| `scripts/` | Executable scripts: `env_spec.sh` (required) + any helpers |
+
+**`tools/` vs `scripts/` distinction:**
+- `tools/` — tool definitions that OpenClaw loads and exposes to the agent as callable tools (e.g., `get_video_meta`, `query_database`). These are invoked by the agent during the task.
+- `scripts/` — shell or Python scripts that run in the execution environment, not called by the agent directly. Always includes `env_spec.sh` (even if empty). May include helper scripts like `init_db.py`, `mock_server.py`, etc.
+
+For the canonical task.md schema, field rules, and blank template — see `references/task-schema.md` and `references/task-blank-template.md`.
+See `references/synthesis-patterns.md` for asset, tool, script, and grading patterns.
+
+---
+
+## Workflow
+
+### Step 1 — Analyze the Skill
+
+Read the skill's SKILL.md (and any referenced files), then establish three things:
+
+**1. Classify family and archetype**
+
+**Skill Family** — where does this skill primarily operate? Choose one:
+
+`Documents & Files` / `Filesystem & Workspace` / `Code / Repo / IDE` / `Browser & Web` / `SaaS / API Integrations` / `Local Apps / OS Automation` / `Media` / `DevOps / System / Cloud` / `Search & Research` / `Agent Infra / Orchestration` / `Hybrid`
+
+**Skill Archetype** — what is the primary action this skill performs? Choose one:
+
+`retrieve` / `extract` / `transform` / `organize` / `analyze` / `generate` / `edit` / `execute` / `validate` / `monitor` / `automate` / `hybrid`
+
+Record both as `family` and `archetype` — they feed into the task.md frontmatter and inform query realism and grading focus in later steps.
+
+**2. Map capability boundary (Provides / Requires)**
+
+| Set | Question | Examples |
+|---|---|---|
+| **Provides** | What tools, APIs, or capabilities does this skill itself expose? | Custom OpenClaw tools it ships, helper scripts it installs, services it starts, APIs it wraps |
+| **Requires** | What must already exist for this skill to function? | CLI binaries (`ffmpeg`, `git`), language runtimes, API keys, OS packages, specific file formats |
+
+- **Provides** → comes with the skill; the task author does not need to prepare these
+- **Requires** → what the task author must prepare: install in `env_spec.sh`, synthesize in `tools/`, declare in `## Toolset Preparation`
+
+**3. Extract baseline information**
+
+- **Trigger scenarios**: what kinds of user requests does this skill handle?
+- **Required inputs**: what must the user provide for the skill to work?
+- **Optional / defaultable inputs**: what can be inferred or has a sensible default?
+- **Failure modes**: what can go wrong if inputs are missing or malformed?
+
+With these three established, you have everything needed to write a realistic query and design the execution environment and evaluation rules.
+
+### Step 2 — Write the Query
+
+With classification and capability boundary established, write the `## Prompt` as a real user message. Two principles govern both the prompt and its relationship to conditions in Step 3:
+
+1. **Natural first.** Write from the user's perspective — goal-oriented, at whatever level of detail feels realistic for the task. A simple request can be one line; a complex one may include steps or constraints. What it should never read like is a task spec or evaluation brief written by the task author.
+
+2. **Query and conditions are a pair.** Write the prompt with awareness of what conditions will (and will not) be provided. What the conditions omit shapes what the agent must do — ask, infer, or explore — and that shapes grading just as much as what the query says.
+
+Additional rules:
+- **Language: Chinese.** Write the prompt in Chinese (Simplified).
+- **No absolute paths, no `assets/`、`task/`、`workspace/` prefix.** Use only filenames or workspace-relative paths (e.g., `data.csv`, `reports/summary.md`). All assets are copied into the agent's workspace at runtime; `assets/` is invisible to the agent.
+
+### Step 3 — Prepare Environment
+
+Define the minimal runtime dependencies and write `scripts/env_spec.sh` to fulfill them.
+
+**Design:** Record what the runtime needs —
+- OS / base image assumptions (if any)
+- Language runtimes and version constraints
+- Required packages (system, pip, npm, etc.)
+- Environment variables (API keys, config values)
+- Background services and their initial state (e.g., "SQLite DB with schema X, seeded with N rows")
+
+**Implement:** Refer to `references/synthesis-patterns.md` (scripts/ section) for the base template; adapt to the dependencies above. Add auxiliary scripts as needed (e.g., `init_db.py`, `seed_data.py`, `start_server.py`).
+
+**Verify (static checklist):**
+- Every dependency checks for existence before installing (e.g., `command -v` for CLI tools, `python3 -c "import X"` for pip packages)
+- Language packages use user-local install (`pip install --user`, local `node_modules`)
+- System CLI tools (`ffmpeg`, `jq`, etc.) check existence first, then fall back to `sudo apt-get` only if missing, and fail with a clear error if sudo is unavailable
+- Every environment variable has a corresponding `export` line
+- Every background service has a start command and a readiness check
+- Every one-time setup step (DB init, data seeding) is present and runs after its dependency is ready
+- Script uses only `$WORKSPACE`, `$TASK_PATH`, `$HOME` — no hardcoded absolute paths
+- Script exits non-zero on any failure (`set -e` or explicit error handling)
+
+Do not proceed to Step 4 until verification passes.
+
+### Step 4 — Prepare Tools
+
+Decide which tools the agent can call, then synthesize and verify them.
+
+For each tool, record:
+- **Name**, **description**, **input schema**, **output schema**
+- **Type** — one of:
+ - `built-in`: already provided by OpenClaw; check `assets/built_in_tools.json` first; no plugin needed
+ - `static mock`: return value is hardcoded regardless of input; asset files for this tool do not need real content
+ - `input-derived mock`: return value is computed from the actual input file; asset files **must** be real (synthesized in Step 5)
+
+> **Format fidelity:** mock return values must look like what a real implementation would return — correct field names, envelope structure, data types. A bare string where structured JSON is expected misleads the agent.
+
+**Implement:** For each `static mock` / `input-derived mock` tool, synthesize a plugin under `tools/{tool_name}/` following the **`openclaw-plugin-creator`** skill. `built-in` tools need no plugin.
+
+**Verify (static checklist):**
+- Every `static mock` / `input-derived mock` tool has a plugin directory under `tools/{tool_name}/`
+
+Do not proceed to Step 5 until verification passes.
+
+### Step 5 — Prepare Assets
+
+Decide which input files the agent needs in its workspace, then create them.
+
+Keep assets consistent with the environment (Step 3) and tools (Step 4): exported files should match seeded DB records; files consumed by `input-derived mock` tools must be real files whose content drives those tools' return values.
+
+Create each file at `assets/{path}`. Generate realistic, self-consistent content. Fall back to a placeholder only when real content cannot be produced and the file will be consumed exclusively by `static mock` tools.
+
+### Step 6 — Expected Behavior & Grading
+
+**Expected behavior:** With query (Step 2) and conditions (Steps 3–5) fixed, reason through what a correct agent execution looks like. Document:
+- **Actions**: tool calls, file operations, decisions the agent should make
+- **Intermediate outputs**: transient artifacts needed for subsequent steps
+- **Final outputs**: the concrete deliverable — file(s) written, message sent, state changed — and expected format/content
+
+**Grading:** Convert expected behaviors into a complete grading spec (must sum to 100%).
+
+For any criterion whose expected value depends on synthesized assets, compute it now from the actual files (run the calculation, compute the hash, etc.).
+
+Classify each criterion:
+- *Can a script check this without ambiguity?* → `automated`
+- *Requires understanding intent, quality, or semantic correctness?* → `llm-judge`
+
+When the conditions lock down the output to a deterministic value, grade the value. When the output is non-deterministic or the task evaluates process/reasoning, grade behavior or quality instead.
+
+**Part 1 — Automated** *(omit if none)*
+- List each criterion as a bullet with percentage weight
+- Provide a fully implemented `grade()` function — no `...` or `pass`; criterion key strings must match bullet text exactly
+
+**Part 2 — LLM Judge** *(omit if none)*
+- List each criterion as a bullet with percentage weight
+- Provide rubrics: full-score, zero-score, and any partial-credit gradations
+
+### Step 7 — Write task.md
+
+With all assets, tools, environment, and grading values known, produce `task.md` following the schema in `references/task-schema.md`. Use `references/task-blank-template.md` as the starting template — fill in every section; leave no placeholders.
+
+Inputs:
+- Frontmatter values: status, difficulty, timeout, grading_mode, family, archetype
+- Prompt (Step 2)
+- Expected Behavior and Grading Criteria (Step 6)
+- Assets Preparation, Toolset Preparation, Environment Specification (Steps 3–5)
+
+**Do not proceed to Step 8 until `task.md` is written to disk.**
+
+### Step 8 — Validate
+
+**Hard gate. Do not deliver until every item passes. For each failure: fix it, then re-check before continuing.**
+
+- [ ] `task.md` exists at the task package root
+- [ ] All frontmatter fields present; no `TODO` placeholders remain
+- [ ] `workspace_files` lists every asset path correctly
+- [ ] Grading percentages sum to 100
+- [ ] Grading Criteria match query type (output-focused vs behavior-focused)
+- [ ] `grade()` is fully implemented — no `...` or `pass`
+- [ ] `grade()` criterion key strings match Grading Criteria bullets exactly
+- [ ] Every asset in **Assets Preparation** exists in `assets/`
+- [ ] Every tool in **Toolset Preparation** exists in `tools/`
+- [ ] `scripts/env_spec.sh` exists and runs without errors
+- [ ] All environment dependencies in **Environment Specification** are covered by `scripts/env_spec.sh`
+
+---
+
+## Output Directory Layout
+
+```
+{task-id}/
+├── task.md
+├── assets/
+│ ├── {file-or-dir-1}
+│ └── {file-or-dir-2}
+├── tools/
+│ └── {tool_name} ← OpenClaw-loadable tool definitions
+└── scripts/
+ ├── env_spec.sh ← always present
+ └── {helper_script}.py ← optional
+```
+
+---
+
+## Quick Reference
+
+**Difficulty heuristics:**
+- `low` — single-step, unambiguous output, < 60 s timeout
+- `medium` — multi-step, some judgment or environment inspection, 60–180 s timeout
+- `high` — open-ended, complex reasoning or multi-turn clarification, > 180 s timeout
+
+**Grading mode:**
+- `automated` — all criteria deterministic (file check, regex, unit test, transcript pattern match)
+- `llm-judge` — all criteria semantic (quality, reasoning, clarification behavior)
+- `hybrid` — mix; default for most tasks
+
+**Grading derives from query + conditions together:**
+- If conditions provide everything needed → grading is output-correctness → prefer automated
+- If conditions omit something the query implies → grading is agent behavior (did it ask? inspect? propose?) → prefer llm-judge + transcript checks
+- If query leaves the approach open → grading is reasoning quality and option coverage → prefer llm-judge
+
+**Step ordering rationale:**
+- Step 1: understand skill → 1.1 classify (family + archetype) → 1.2 capability boundary (Provides / Requires)
+- Step 2: write prompt (natural, goal-oriented; query and conditions designed as a pair)
+- Step 3: prepare environment — design deps, write `env_spec.sh`, verify
+- Step 4: prepare tools — design sufficient/actual sets, synthesize plugins, verify
+- Step 5: prepare assets — design sufficient/actual sets, synthesize files
+- Step 6: expected behavior + grading (precompute values, write `grade()`, LLM judge rubrics)
+- Step 7: write `task.md` via task-markdown-editor skill
+- Step 8: validate — hard gate, fix and re-check before delivering
+
+**Status lifecycle:** `preparing` → `ready` → `deprecated`
diff --git a/workspace/skills/task-synthesis/assets/built_in_tools.json b/workspace/skills/task-synthesis/assets/built_in_tools.json
new file mode 100644
index 00000000..55a7c4e1
--- /dev/null
+++ b/workspace/skills/task-synthesis/assets/built_in_tools.json
@@ -0,0 +1,519 @@
+{
+ "_source": "openclaw built-in tools extracted from dist/tool-catalog-BjSY4C4F.js",
+ "_version": "2026.3.23-2",
+ "sections": [
+ {
+ "id": "fs",
+ "label": "Files",
+ "tools": [
+ {
+ "id": "read",
+ "label": "read",
+ "description": "Read file contents",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "write",
+ "label": "write",
+ "description": "Create or overwrite files",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "edit",
+ "label": "edit",
+ "description": "Make precise edits",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "apply_patch",
+ "label": "apply_patch",
+ "description": "Patch files (OpenAI)",
+ "profiles": [
+ "coding"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "runtime",
+ "label": "Runtime",
+ "tools": [
+ {
+ "id": "exec",
+ "label": "exec",
+ "description": "Run shell commands",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "process",
+ "label": "process",
+ "description": "Manage background processes",
+ "profiles": [
+ "coding"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "web",
+ "label": "Web",
+ "tools": [
+ {
+ "id": "web_search",
+ "label": "web_search",
+ "description": "Search the web",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "web_fetch",
+ "label": "web_fetch",
+ "description": "Fetch web content",
+ "profiles": [
+ "coding"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "memory",
+ "label": "Memory",
+ "tools": [
+ {
+ "id": "memory_search",
+ "label": "memory_search",
+ "description": "Semantic search",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "memory_get",
+ "label": "memory_get",
+ "description": "Read memory files",
+ "profiles": [
+ "coding"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "sessions",
+ "label": "Sessions",
+ "tools": [
+ {
+ "id": "sessions_list",
+ "label": "sessions_list",
+ "description": "List sessions",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_history",
+ "label": "sessions_history",
+ "description": "Session history",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_send",
+ "label": "sessions_send",
+ "description": "Send to session",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_spawn",
+ "label": "sessions_spawn",
+ "description": "Spawn sub-agent",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "sessions_yield",
+ "label": "sessions_yield",
+ "description": "End turn to receive sub-agent results",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "subagents",
+ "label": "subagents",
+ "description": "Manage sub-agents",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "session_status",
+ "label": "session_status",
+ "description": "Session status",
+ "profiles": [
+ "minimal",
+ "coding",
+ "messaging"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "ui",
+ "label": "UI",
+ "tools": [
+ {
+ "id": "browser",
+ "label": "browser",
+ "description": "Control web browser",
+ "profiles": []
+ },
+ {
+ "id": "canvas",
+ "label": "canvas",
+ "description": "Control canvases",
+ "profiles": []
+ }
+ ]
+ },
+ {
+ "id": "messaging",
+ "label": "Messaging",
+ "tools": [
+ {
+ "id": "message",
+ "label": "message",
+ "description": "Send messages",
+ "profiles": [
+ "messaging"
+ ]
+ }
+ ]
+ },
+ {
+ "id": "automation",
+ "label": "Automation",
+ "tools": [
+ {
+ "id": "cron",
+ "label": "cron",
+ "description": "Schedule tasks",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "gateway",
+ "label": "gateway",
+ "description": "Gateway control",
+ "profiles": []
+ }
+ ]
+ },
+ {
+ "id": "nodes",
+ "label": "Nodes",
+ "tools": [
+ {
+ "id": "nodes",
+ "label": "nodes",
+ "description": "Nodes + devices",
+ "profiles": []
+ }
+ ]
+ },
+ {
+ "id": "agents",
+ "label": "Agents",
+ "tools": [
+ {
+ "id": "agents_list",
+ "label": "agents_list",
+ "description": "List agents",
+ "profiles": []
+ }
+ ]
+ },
+ {
+ "id": "media",
+ "label": "Media",
+ "tools": [
+ {
+ "id": "image",
+ "label": "image",
+ "description": "Image understanding",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "image_generate",
+ "label": "image_generate",
+ "description": "Image generation",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "tts",
+ "label": "tts",
+ "description": "Text-to-speech conversion",
+ "profiles": []
+ }
+ ]
+ }
+ ],
+ "tools": [
+ {
+ "id": "read",
+ "label": "read",
+ "description": "Read file contents",
+ "section": "fs",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "write",
+ "label": "write",
+ "description": "Create or overwrite files",
+ "section": "fs",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "edit",
+ "label": "edit",
+ "description": "Make precise edits",
+ "section": "fs",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "apply_patch",
+ "label": "apply_patch",
+ "description": "Patch files (OpenAI)",
+ "section": "fs",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "exec",
+ "label": "exec",
+ "description": "Run shell commands",
+ "section": "runtime",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "process",
+ "label": "process",
+ "description": "Manage background processes",
+ "section": "runtime",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "web_search",
+ "label": "web_search",
+ "description": "Search the web",
+ "section": "web",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "web_fetch",
+ "label": "web_fetch",
+ "description": "Fetch web content",
+ "section": "web",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "memory_search",
+ "label": "memory_search",
+ "description": "Semantic search",
+ "section": "memory",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "memory_get",
+ "label": "memory_get",
+ "description": "Read memory files",
+ "section": "memory",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "sessions_list",
+ "label": "sessions_list",
+ "description": "List sessions",
+ "section": "sessions",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_history",
+ "label": "sessions_history",
+ "description": "Session history",
+ "section": "sessions",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_send",
+ "label": "sessions_send",
+ "description": "Send to session",
+ "section": "sessions",
+ "profiles": [
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "sessions_spawn",
+ "label": "sessions_spawn",
+ "description": "Spawn sub-agent",
+ "section": "sessions",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "sessions_yield",
+ "label": "sessions_yield",
+ "description": "End turn to receive sub-agent results",
+ "section": "sessions",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "subagents",
+ "label": "subagents",
+ "description": "Manage sub-agents",
+ "section": "sessions",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "session_status",
+ "label": "session_status",
+ "description": "Session status",
+ "section": "sessions",
+ "profiles": [
+ "minimal",
+ "coding",
+ "messaging"
+ ]
+ },
+ {
+ "id": "browser",
+ "label": "browser",
+ "description": "Control web browser",
+ "section": "ui",
+ "profiles": []
+ },
+ {
+ "id": "canvas",
+ "label": "canvas",
+ "description": "Control canvases",
+ "section": "ui",
+ "profiles": []
+ },
+ {
+ "id": "message",
+ "label": "message",
+ "description": "Send messages",
+ "section": "messaging",
+ "profiles": [
+ "messaging"
+ ]
+ },
+ {
+ "id": "cron",
+ "label": "cron",
+ "description": "Schedule tasks",
+ "section": "automation",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "gateway",
+ "label": "gateway",
+ "description": "Gateway control",
+ "section": "automation",
+ "profiles": []
+ },
+ {
+ "id": "nodes",
+ "label": "nodes",
+ "description": "Nodes + devices",
+ "section": "nodes",
+ "profiles": []
+ },
+ {
+ "id": "agents_list",
+ "label": "agents_list",
+ "description": "List agents",
+ "section": "agents",
+ "profiles": []
+ },
+ {
+ "id": "image",
+ "label": "image",
+ "description": "Image understanding",
+ "section": "media",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "image_generate",
+ "label": "image_generate",
+ "description": "Image generation",
+ "section": "media",
+ "profiles": [
+ "coding"
+ ]
+ },
+ {
+ "id": "tts",
+ "label": "tts",
+ "description": "Text-to-speech conversion",
+ "section": "media",
+ "profiles": []
+ }
+ ]
+}
diff --git a/workspace/skills/task-synthesis/references/synthesis-patterns.md b/workspace/skills/task-synthesis/references/synthesis-patterns.md
new file mode 100644
index 00000000..6cfe15b9
--- /dev/null
+++ b/workspace/skills/task-synthesis/references/synthesis-patterns.md
@@ -0,0 +1,473 @@
+# Synthesis Patterns
+
+Patterns for generating scripts/, tools/, assets/, and grading logic.
+
+## Table of Contents
+1. [scripts/ Template and Patterns](#scripts-template-and-patterns)
+2. [Tool Synthesis Patterns](#tool-synthesis-patterns)
+3. [Asset Synthesis Patterns](#asset-synthesis-patterns)
+4. [Automated Grader Code Patterns](#automated-grader-code-patterns)
+ - [Output-focused graders](#output-focused-graders)
+ - [Behavior-focused graders (agent asked / inspected / proposed)](#behavior-focused-graders)
+5. [LLM Judge Rubric Anchors](#llm-judge-rubric-anchors)
+ - [Output quality](#output-quality-content-correctness--completeness--format)
+ - [Reasoning / approach quality](#reasoning--approach-quality)
+ - [Clarification behavior](#clarification-behavior-underspecified--ambiguous-tasks)
+
+---
+
+## scripts/ Template and Patterns
+
+`scripts/env_spec.sh` is always required. Additional helper scripts (init_db.py, seed_data.py, start_server.py, etc.) also go in `scripts/`.
+
+### env_spec.sh base template
+
+> ⚠️ Replace all `{PLACEHOLDER}` values before using.
+>
+> **Installation rules (strictly enforced):**
+> - **Check before install.** Don't blindly install — first check if the binary/package already exists. Common runtimes (python3, node, git, curl) are usually pre-installed.
+> - **User-local first.** For language packages, always install to user-local paths: `pip install --user`, `npm install` (local `node_modules`), `$HOME/.local/bin`, etc.
+> - **System packages as fallback.** For tools that can only be installed system-wide (ffmpeg, imagemagick, etc.), check existence first, then try `sudo apt-get` only if missing. If sudo is unavailable, fail with a clear error message — never silently skip.
+>
+> **Path rules (strictly enforced):**
+> - Use `$WORKSPACE`, `$TASK_PATH`, `$HOME` — never hardcode absolute paths like `/tmp/`, `/workspace/`, `/usr/local/bin/`
+> - Output files and generated artifacts go under `$WORKSPACE`, not `/tmp/` or any fixed system path
+> - To make a mock script available on PATH, use `export PATH="$HOME/.local/bin:$PATH"` rather than copying to `/usr/local/bin/`
+> - Mock user-environment dependencies (e.g., a CLI tool that requires GPU/manual install) in `env_spec.sh` when the evaluation environment cannot satisfy them — but keep mocks path-clean using the rules above
+
+```bash
+#!/usr/bin/env bash
+# scripts/env_spec.sh — Task execution environment initialization
+# Task: {task-id}
+set -euo pipefail
+
+WORKSPACE="${WORKSPACE_PATH:?Error: WORKSPACE_PATH is not set}"
+TASK_PATH="${TASK_PATH:?Error: TASK_PATH is not set}"
+
+# User-local bin on PATH
+export PATH="$HOME/.local/bin:$PATH"
+
+echo "=== Copying assets to workspace ==="
+mkdir -p "$WORKSPACE"
+cp -r "$TASK_PATH/assets/." "$WORKSPACE/"
+
+echo "=== Checking & installing system CLI tools ==="
+# Check first; install only if missing and sudo is available
+require_cli() {
+ if command -v "$1" &>/dev/null; then
+ echo " ✓ $1 already installed"
+ elif sudo -n true 2>/dev/null; then
+ echo " ⟳ Installing $1 via apt-get..."
+ sudo apt-get update -qq && sudo apt-get install -y --no-install-recommends "$1"
+ else
+ echo " ✗ $1 not found and sudo not available" >&2; exit 1
+ fi
+}
+# require_cli "ffmpeg"
+# require_cli "jq"
+
+echo "=== Checking & installing Python packages ==="
+# Check before install; user-local only
+install_pip_pkg() {
+ python3 -c "import $1" 2>/dev/null || pip install --user --quiet "$2"
+}
+# install_pip_pkg "pandas" "pandas==2.1.0"
+# install_pip_pkg "requests" "requests==2.31.0"
+
+echo "=== Installing custom tool plugins ==="
+bash "$TASK_PATH/scripts/plugin_install.sh"
+
+echo "=== Environment variables ==="
+# export {ENV_VAR_NAME}="{value}"
+
+echo "=== One-time setup ==="
+# python3 "$TASK_PATH/scripts/init_db.py"
+
+echo "=== Environment ready ==="
+```
+
+### Patterns for common environments
+
+**System CLI tools (check → sudo fallback → fail):**
+```bash
+require_cli() {
+ if command -v "$1" &>/dev/null; then
+ echo " ✓ $1 already installed"
+ elif sudo -n true 2>/dev/null; then
+ sudo apt-get update -qq && sudo apt-get install -y --no-install-recommends "$1"
+ else
+ echo " ✗ $1 not found and sudo not available" >&2; exit 1
+ fi
+}
+require_cli "ffmpeg"
+require_cli "jq"
+```
+
+**Python packages (user-local, check first):**
+```bash
+install_pip_pkg() {
+ python3 -c "import $1" 2>/dev/null || pip install --user --quiet "$2"
+}
+install_pip_pkg "pandas" "pandas==2.1.0"
+install_pip_pkg "numpy" "numpy"
+install_pip_pkg "sklearn" "scikit-learn"
+```
+
+**Node.js packages (local node_modules):**
+```bash
+cd "$WORKSPACE" && npm install
+```
+
+**SQLite database init:**
+```bash
+python3 "$TASK_PATH/scripts/init_db.py"
+```
+
+**No network access (default):**
+```bash
+# No network setup needed — task runs offline
+```
+
+**Service startup (e.g., local API mock):**
+```bash
+python3 "$TASK_PATH/scripts/start_server.py" &
+sleep 1 # wait for server to start
+```
+
+---
+
+## Tool Synthesis Patterns
+
+Tool synthesis is governed by the **`openclaw-plugin-creator`** skill — load it for the complete structure (`openclaw.plugin.json`, `package.json`, `src/index.ts`) and implementation guidance.
+
+Task-specific note: tools in `tools/` are invoked **by the agent during evaluation**. When the task has a fixed expected output, implement a deterministic mock rather than a live integration — the mock should return realistic, precomputed data consistent with the synthesized assets.
+
+### plugin_install.sh
+
+After synthesizing all tool plugins, create `scripts/plugin_install.sh` to install them. This script is referenced in the Step 4 static checklist.
+
+```bash
+#!/usr/bin/env bash
+# scripts/plugin_install.sh — Install all custom tool plugins for this task
+# Task: {task-id}
+set -euo pipefail
+
+TASK_PATH="${TASK_PATH:?Error: TASK_PATH is not set}"
+
+echo "=== Installing custom tool plugins ==="
+for plugin_dir in "$TASK_PATH/tools"/*/; do
+ if [ -f "$plugin_dir/openclaw.plugin.json" ]; then
+ echo "Installing plugin: $(basename "$plugin_dir")"
+ cd "$plugin_dir" && openclaw plugins install .
+ fi
+done
+
+echo "=== Plugin installation complete ==="
+```
+
+---
+
+## Asset Synthesis Patterns
+
+### Principle: Realistic, Self-Consistent Content
+
+Assets must be realistic enough that an agent can complete the task without confusion. Avoid Lorem Ipsum unless the task is explicitly about text corpora.
+
+### Text / Markdown files
+
+Generate content matching the task topic. Include realistic structure (headings, lists, code blocks) if the task involves document processing.
+
+### CSV / JSON data files
+
+- Define a clear schema in the task's **Assets Preparation** section first
+- Generate 20–100 rows; more for data analysis tasks, fewer for simple parsing tasks
+- Ensure the "expected answer" in the grader is precomputed from the generated data
+- Use consistent column names and types; avoid nulls unless the task tests null handling
+
+```python
+# Example: generating a CSV asset programmatically
+import csv, random, io
+
+rows = [{"id": i, "name": f"Student_{i}", "score": round(random.uniform(40, 100), 2)}
+ for i in range(1, 51)]
+out = io.StringIO()
+writer = csv.DictWriter(out, fieldnames=["id", "name", "score"])
+writer.writeheader()
+writer.writerows(rows)
+csv_content = out.getvalue()
+```
+
+### Code / Project directories
+
+Scaffold a realistic project structure. Include:
+- A main entry point (e.g., `main.py`, `index.js`)
+- At least one module with meaningful stub functions
+- A README or docstring explaining what the project is supposed to do
+- Any config files the task requires (e.g., `requirements.txt`, `package.json`)
+
+### Database files (SQLite)
+
+Provide a Python script at `scripts/init_db.py` that creates and populates the database, and call it from `env_spec.sh`. Do not commit binary `.sqlite` files directly.
+
+```python
+# scripts/init_db.py
+import sqlite3, os
+
+# Use WORKSPACE_PATH env var (injected at runtime); fall back to cwd for local testing
+workspace = os.environ.get("WORKSPACE_PATH", os.getcwd())
+db_path = os.environ.get("DB_PATH", os.path.join(workspace, "db.sqlite"))
+conn = sqlite3.connect(db_path)
+conn.execute("""CREATE TABLE IF NOT EXISTS records (
+ id INTEGER PRIMARY KEY,
+ name TEXT,
+ value REAL
+)""")
+conn.executemany("INSERT INTO records VALUES (?, ?, ?)", [
+ (1, "alpha", 42.0),
+ (2, "beta", 17.5),
+ (3, "gamma", 99.1),
+])
+conn.commit()
+conn.close()
+print(f"Database initialized at {db_path}")
+```
+
+---
+
+## Automated Grader Code Patterns
+
+Two sub-types, pick based on what the task actually grades:
+
+- **Output-focused** — check files or values the agent produced (existence, content, format, numeric accuracy). Primary evidence is `workspace_path`.
+- **Behavior-focused** — check how the agent acted (which tools it called, what it said, whether it asked before proceeding). Primary evidence is `transcript`.
+
+Both types can coexist in a single `grade()` function; just mix snippets as needed.
+
+### Output-focused graders
+
+### File existence
+
+```python
+exists = os.path.isfile(os.path.join(workspace_path, "output.md"))
+scores["File `output.md` created"] = 1.0 if exists else 0.0
+```
+
+### Non-empty content
+
+```python
+content = open(path).read().strip() if os.path.isfile(path) else ""
+scores["Output is non-empty"] = 1.0 if content else 0.0
+```
+
+### Regex match
+
+```python
+import re
+scores["Output contains a Markdown heading"] = (
+ 1.0 if re.search(r"^#{1,6} \w", content, re.MULTILINE) else 0.0
+)
+```
+
+### JSON validity + schema check
+
+```python
+import json
+try:
+ data = json.loads(content)
+ scores["Output is valid JSON"] = 1.0
+ scores["JSON has required keys"] = 1.0 if {"name", "value"} <= data.keys() else 0.0
+except (json.JSONDecodeError, AttributeError):
+ scores["Output is valid JSON"] = 0.0
+ scores["JSON has required keys"] = 0.0
+```
+
+### Script execution
+
+```python
+import subprocess
+result = subprocess.run(
+ ["python3", os.path.join(workspace_path, "solution.py")],
+ capture_output=True, timeout=30, cwd=workspace_path
+)
+scores["Script runs without error"] = 1.0 if result.returncode == 0 else 0.0
+```
+
+### Numeric accuracy
+
+```python
+try:
+ val = float(open(os.path.join(workspace_path, "output.txt")).read().strip())
+ expected = 73.42
+ scores["Value within ±0.01 of expected"] = 1.0 if abs(val - expected) <= 0.01 else 0.0
+except (ValueError, FileNotFoundError):
+ scores["Value within ±0.01 of expected"] = 0.0
+```
+
+### Transcript-based check
+
+```python
+tool_calls = [m.get("tool_name") for m in transcript if m.get("type") == "tool_call"]
+scores["Agent used query_database tool"] = 1.0 if "query_database" in tool_calls else 0.0
+```
+
+### Partial credit
+
+```python
+lines = [l for l in content.splitlines() if l.strip()]
+scores["Output has sufficient lines"] = min(1.0, len(lines) / 10)
+```
+
+### Safe wrapper (always use in grade())
+
+```python
+def grade(transcript: list, workspace_path: str) -> dict:
+ import os, re
+ scores = {}
+ try:
+ # ... all grading logic here — no `pass` or `...` placeholders ...
+ scores["criterion text"] = 1.0 # replace with real logic
+ except Exception:
+ # Never let grade() raise — default unscored criteria to 0.0
+ for key in ["criterion text"]: # list all expected keys
+ scores.setdefault(key, 0.0)
+ return scores
+```
+
+---
+
+### Behavior-focused graders (agent asked / inspected / proposed)
+
+For tasks where conditions omit something the query implies, the agent's **transcript** is the primary evidence. Correct behavior is about how the agent responds to the gap, not what file it produces.
+
+#### Helper: extract all agent text turns
+
+```python
+def _agent_text(transcript: list) -> str:
+ """Concatenate all assistant/agent text messages into one searchable string."""
+ parts = []
+ for m in transcript:
+ if m.get("role") in ("assistant", "agent") and isinstance(m.get("content"), str):
+ parts.append(m["content"])
+ return "\n".join(parts).lower()
+```
+
+#### Check: agent named the missing parameter
+
+```python
+agent_text = _agent_text(transcript)
+# Replace "output format" with the actual missing parameter name
+named_gap = any(kw in agent_text for kw in ["output format", "format", "which format"])
+scores["Agent identified the missing parameter by name"] = 1.0 if named_gap else 0.0
+```
+
+#### Check: agent inspected the environment before responding
+
+```python
+tool_calls = [m.get("tool_name", "") for m in transcript if m.get("type") == "tool_call"]
+env_tools = {"list_files", "read_file", "read_config", "ls", "cat"} # adjust to task toolset
+inspected = bool(env_tools & set(tool_calls))
+scores["Agent inspected environment before responding"] = 1.0 if inspected else 0.0
+```
+
+#### Check: agent proposed concrete options (not a vague "please clarify")
+
+```python
+import re
+agent_text = _agent_text(transcript)
+# Look for enumeration patterns: "1.", "- option", "either X or Y", etc.
+has_options = bool(re.search(
+ r"(\b(option|choice|either|or)\b.*\b(option|choice|either|or)\b"
+ r"|\b1[\.\)]\s+\w|\b-\s+\w.*\n\s*-\s+\w)",
+ agent_text, re.IGNORECASE | re.DOTALL
+))
+scores["Agent proposed 2+ concrete options with reasoning"] = 1.0 if has_options else 0.0
+```
+
+#### Check: agent did NOT silently proceed to produce output
+
+```python
+import os
+# The task's expected output file should NOT exist if agent correctly withheld action
+output_exists = os.path.isfile(os.path.join(workspace_path, "output.txt"))
+scores["Agent did not silently produce output without resolving ambiguity"] = (
+ 0.0 if output_exists else 1.0
+)
+```
+
+#### Check: agent asked a targeted clarifying question
+
+```python
+agent_text = _agent_text(transcript)
+# Look for question marks and parameter-related keywords in the same sentence
+sentences = re.split(r"[.!]\s+", agent_text)
+asked = any("?" in s and any(kw in s for kw in ["format", "output", "destination", "version"])
+ for s in sentences)
+scores["Agent asked a targeted clarifying question"] = 1.0 if asked else 0.0
+```
+
+*For LLM judge rubric templates matching these graders, see [LLM Judge Rubric Anchors → Clarification behavior](#clarification-behavior-underspecified--ambiguous-tasks) below.*
+
+---
+
+## LLM Judge Rubric Anchors
+
+Use LLM judge criteria whenever a script cannot capture the requirement — output quality, reasoning depth, or clarification behavior. The three most common scenarios:
+
+### Output quality (content correctness / completeness / format)
+
+Use when the output is a natural-language artifact (summary, explanation, report, code review, plan) and correctness requires understanding intent, not just pattern matching.
+
+```markdown
+**{Criterion name} (0–1):**
+- **1.0** — {describe what a fully correct, complete, well-formed response looks like}
+- **0.5** — {describe a partially correct response: present but incomplete, slightly off, or minor format issue}
+- **0.0** — {describe a clearly wrong response: missing, hallucinated, or structurally broken}
+```
+
+Example — "Summary covers all key points":
+```markdown
+**Summary covers all key points (0–1):**
+- **1.0** — Summary mentions all {N} major topics from the source document with accurate facts
+- **0.5** — Summary covers most topics but omits 1–2 important points, or includes minor inaccuracies
+- **0.0** — Summary is missing, covers fewer than half the topics, or contains significant hallucinations
+```
+
+### Reasoning / approach quality
+
+Use when the task is open-ended and the path matters as much as (or more than) the final answer — e.g., debugging steps, investigation approach, multi-step planning.
+
+```markdown
+**{Criterion name} (0–1):**
+- **1.0** — Agent followed a logical, systematic approach: {describe the ideal sequence of steps or tool calls}
+- **0.5** — Agent reached the correct conclusion but via an inefficient or partially incorrect route
+- **0.0** — Agent skipped key steps, jumped to conclusions without evidence, or produced an incorrect result
+```
+
+Example — "Agent diagnosed root cause correctly":
+```markdown
+**Agent diagnosed root cause correctly (0–1):**
+- **1.0** — Agent inspected relevant logs/files, identified the specific error and its cause, and stated it clearly
+- **0.5** — Agent found the symptom but misidentified the root cause, or identified the cause without citing evidence
+- **0.0** — Agent did not investigate, guessed without evidence, or identified an unrelated issue
+```
+
+### Clarification behavior (underspecified / ambiguous tasks)
+
+Use when the query intentionally omits required information and the correct behavior is to ask, not to proceed silently.
+
+```markdown
+**Agent identified the missing parameter (0–1):**
+- **1.0** — Agent explicitly named "{parameter}" as missing or unknown, before taking any action
+- **0.5** — Agent expressed uncertainty but did not name the specific parameter
+- **0.0** — Agent proceeded without acknowledging the gap, or asked a completely generic question
+
+**Agent proposed concrete options (0–1):**
+- **1.0** — Agent listed 2+ specific, named options with a brief rationale for each
+- **0.5** — Agent suggested options exist but did not name them, or listed only one
+- **0.0** — Agent gave no options; asked only "please clarify" with no guidance
+
+**Agent did not proceed without required information (0–1):**
+- **1.0** — Agent withheld all output-producing actions until the gap was resolved
+- **0.5** — Agent produced partial output with an explicit caveat about the assumption made
+- **0.0** — Agent silently assumed a value and produced output without flagging the assumption
+```
diff --git a/workspace/skills/task-synthesis/references/task-blank-template.md b/workspace/skills/task-synthesis/references/task-blank-template.md
new file mode 100644
index 00000000..7f413f15
--- /dev/null
+++ b/workspace/skills/task-synthesis/references/task-blank-template.md
@@ -0,0 +1,91 @@
+---
+id: {task-id}
+name: {task-name}
+scene: {scene}
+category: {category}
+family: {family}
+archetype: {archetype}
+skill_set:
+ - {skill-id}
+grading_mode: automated|llm-judge|hybrid
+timeout_seconds: 180
+workspace_files:
+ - assets/{filename-or-dir}
+difficulty: low|medium|high
+status: preparing|ready|deprecated
+---
+
+## Prompt
+
+{task prompt — natural language, goal-oriented user message}
+
+## Expected Behavior
+
+The agent should:
+
+- [ ] {action or intermediate step}
+- [ ] {action or intermediate step}
+- [ ] {final output produced}
+
+## Assets Preparation
+
+- `assets/{filename}` — {format; content description including schema, size, key values}
+
+## Toolset Preparation
+
+- `{tool_name}({param}: {type}, ...) -> {return_type}` — {description}
+ - Input: `{param}` ({type}) — {description}
+ - Output: `{structure description}`
+ - Type: built-in | static mock | input-derived mock
+
+## Environment Specification
+
+- Platform: Ubuntu 22.04
+- Python: 3.11
+- pip: {package==version, ...}
+- System: none
+- Network: none
+- Environment variables: none
+- Services: none
+
+## Grading Criteria
+
+### Automated Criteria ({N}%)
+
+
+
+- [ ] {deterministically checkable criterion}
+
+### LLM Judge Criteria ({N}%)
+
+
+
+- [ ] {semantically evaluated criterion}
+
+## Automated Checks
+
+```python
+def grade(transcript: list, workspace_path: str) -> dict:
+ """
+ Args:
+ transcript: Parsed JSONL conversation transcript as list of dicts.
+ workspace_path: Absolute path to the task's isolated workspace directory.
+ Returns:
+ Dict mapping each Automated Criteria bullet (exact text) to score 0.0–1.0.
+ """
+ import os
+
+ scores = {}
+
+ # --- implement grading logic here ---
+
+ return scores
+```
+
+## LLM Judge Rubric
+
+**{Criterion name} (0–1):** {what is being evaluated}
+
+- **1.0** — {excellent / fully correct}
+- **0.5** — {partial / acceptable}
+- **0.0** — {failure / missing}
diff --git a/workspace/skills/task-synthesis/references/task-schema.md b/workspace/skills/task-synthesis/references/task-schema.md
new file mode 100644
index 00000000..b7554b71
--- /dev/null
+++ b/workspace/skills/task-synthesis/references/task-schema.md
@@ -0,0 +1,194 @@
+# task.md Schema Reference
+
+Complete field-by-field reference for the task.md format.
+
+## Frontmatter Fields
+
+```yaml
+---
+id: {task-id} # Provided in prompt — use as-is; do NOT modify
+name: {task-name} # GENERATED: short, hyphen-linked, content-focused (see naming rules)
+scene: {scene} # Provided in prompt — use as-is
+category: {category} # Provided in prompt — use as-is
+family: {family} # GENERATED: see allowed values below
+archetype: {archetype} # GENERATED: see allowed values below
+skill_set:
+ - {skill-id} # Provided in prompt — use as-is
+grading_mode: automated|llm-judge|hybrid # Determined during synthesis
+timeout_seconds: 180 # Adjust: low=60, medium=180, high=300+
+workspace_files: # List of ALL files provided to the agent
+ - assets/{filename-or-dir}
+difficulty: low|medium|high
+status: preparing|ready|deprecated
+---
+```
+
+### Field Rules
+
+**Fields provided in the prompt — use as-is, do NOT alter:**
+- `id`, `scene`, `category`, `skill_set`
+
+**Fields always generated during task synthesis:**
+
+#### `name`
+- Short, descriptive, hyphen-linked slug
+- 3–6 words max; omit filler words ("the", "a", "task")
+- Must reflect the concrete action and subject
+- Examples: `extract-invoice-totals`, `rename-files-by-date`, `summarize-pr-diff`, `query-sqlite-schema`
+
+#### `family`
+Must be exactly one of:
+
+`Documents & Files` / `Filesystem & Workspace` / `Code / Repo / IDE` / `Browser & Web` / `SaaS / API Integrations` / `Local Apps / OS Automation` / `Media` / `DevOps / System / Cloud` / `Search & Research` / `Agent Infra / Orchestration` / `Hybrid`
+
+#### `archetype`
+Must be exactly one of:
+
+`retrieve` / `extract` / `transform` / `organize` / `analyze` / `generate` / `edit` / `execute` / `validate` / `monitor` / `automate` / `hybrid`
+
+#### `grading_mode`
+- `automated` — all criteria are deterministically checkable via `grade()`
+- `llm-judge` — all criteria require semantic evaluation
+- `hybrid` — mix of both; default for most tasks
+
+#### `difficulty`
+- `low` — single-step, unambiguous output, ≤60s
+- `medium` — multi-step, some judgment or env inspection, 60–180s
+- `high` — open-ended, complex reasoning or multi-turn clarification, >180s
+
+---
+
+## Section Reference
+
+### `## Prompt`
+
+Real user message — natural language, goal-oriented. NOT a spec or step list.
+- Write as the user would type it: terse, target-focused
+- When referring to files, use only the filename or a relative path (e.g., `data.csv`, `reports/summary.csv`) — never absolute paths like `/workspace/data.csv` or path placeholders like `{file}`; all assets are placed in the agent's current workspace at runtime
+
+### `## Expected Behavior`
+
+Checklist for evaluators. Covers: key actions → intermediate outputs → final deliverable.
+
+Format: checkbox list (`- [ ]`), ordered by execution flow.
+
+Each item must be concrete and verifiable (not "does a good job").
+Each item should map to a Grading Criteria bullet.
+
+```markdown
+The agent should:
+
+- [ ] Read `assets/input.csv` and parse its rows
+- [ ] Compute the sum of the `amount` column
+- [ ] Write a JSON file to `output.json` containing `{"total": }`
+```
+
+### `## Assets Preparation`
+
+All files the agent can read at runtime. Listed as bullet items.
+
+Each entry must include:
+- File path (relative to workspace root, e.g., `assets/data.csv`)
+- Format / file type
+- Content description: schema, size, topic, key values, any constraints
+
+```markdown
+- `assets/data.csv` — CSV, 50 rows, columns: `id` (int), `name` (str), `amount` (float); total of `amount` column = 1234.56
+- `assets/config.json` — JSON, fields: `output_dir` (str), `format` (str, one of "json"|"csv")
+```
+
+### `## Toolset Preparation`
+
+All custom plugins the agent may call. Listed as bullet items.
+
+Each entry must include:
+- Tool name (snake_case)
+- One-line description
+- Input parameters: `name: type — description`
+- Return value: type and structure
+- Mock type: deterministic mock or live stub
+
+```markdown
+- `get_file_meta(path: str) -> dict` — Returns file metadata (size, mtime, mime_type). Deterministic mock returning fixed values for known paths.
+ - Input: `path` (str) — absolute or workspace-relative file path
+ - Output: `{ "size": int, "mtime": str, "mime_type": str }`
+```
+
+Omit this section entirely if no custom tools are needed.
+
+### `## Environment Specification`
+
+Minimal runtime dependencies. Include only what is strictly required.
+
+```markdown
+- Platform: Ubuntu 22.04
+- Python: 3.11
+- pip: pandas==2.1.0, requests==2.31.0
+- System: none
+- Network: none
+- Environment variables: none
+```
+
+### `## Grading Criteria`
+
+Two subsections summing to exactly 100%.
+- `automated` tasks: 100% Automated, remove LLM section
+- `llm-judge` tasks: 100% LLM, remove Automated section
+- `hybrid` tasks: split as appropriate
+
+Each Automated bullet → key in `grade()` return dict (text must match exactly).
+Each LLM bullet → dimension in LLM Judge Rubric.
+
+```markdown
+### Automated Criteria (60%)
+
+- [ ] Output file `output.json` exists
+- [ ] `total` field equals 1234.56
+
+### LLM Judge Criteria (40%)
+
+- [ ] Agent explained its computation steps clearly
+```
+
+### `## Automated Checks`
+
+Python function `grade(transcript, workspace_path) -> dict`.
+
+Rules:
+- Keys must exactly match Automated Criteria bullet text
+- Scores: 0.0 (fail) to 1.0 (pass); partial credit allowed
+- Never raise exceptions — catch all errors, default to 0.0
+- Standard library only unless package is guaranteed by Environment Specification
+- No `...` or `pass` placeholders — fully implemented
+
+```python
+def grade(transcript: list, workspace_path: str) -> dict:
+ import os, json
+
+ scores = {}
+
+ # criterion: "Output file `output.json` exists"
+ out_path = os.path.join(workspace_path, "output.json")
+ scores["Output file `output.json` exists"] = 1.0 if os.path.isfile(out_path) else 0.0
+
+ # criterion: "`total` field equals 1234.56"
+ try:
+ with open(out_path) as f:
+ data = json.load(f)
+ scores["`total` field equals 1234.56"] = 1.0 if abs(data.get("total", 0) - 1234.56) < 0.01 else 0.0
+ except Exception:
+ scores["`total` field equals 1234.56"] = 0.0
+
+ return scores
+```
+
+### `## LLM Judge Rubric`
+
+One rubric block per LLM Judge Criteria bullet. Each block has explicit 1.0 / 0.5 / 0.0 anchors describing concrete, observable agent behaviors.
+
+```markdown
+**Computation explanation (0–1):** Whether the agent explained how it computed the total.
+- **1.0** — Agent explicitly stated the column used, the operation performed, and the result value
+- **0.5** — Agent mentioned the computation but omitted one key detail (column name or result)
+- **0.0** — Agent produced no explanation or explanation was factually wrong
+```
diff --git a/workspace/skills/task-synthesis/scripts/plugin_install.sh b/workspace/skills/task-synthesis/scripts/plugin_install.sh
new file mode 100755
index 00000000..a1b17138
--- /dev/null
+++ b/workspace/skills/task-synthesis/scripts/plugin_install.sh
@@ -0,0 +1,150 @@
+#!/usr/bin/env bash
+# task_initialize.sh — Initialize a synthesized task package
+#
+# Usage:
+# ./task_initialize.sh [skill_install_dir]
+#
+# Arguments:
+# task_dir Path to the synthesized task directory (contains task.md, assets/, tools/, scripts/)
+# skill_install_dir (optional) Directory to install the source skill into
+# Default: /mnt/shared-storage-user/llmit/user/sunyanan/openclaw/runtime_skills
+#
+# What this script does:
+# 1. Run scripts/env_spec.sh — set up the task execution environment
+# 2. Install tools/ — install each custom tool plugin into OpenClaw via `openclaw plugins install --link`
+# 3. Install the source skill — copy the skill directory to
+
+set -euo pipefail
+
+# ── Args ──────────────────────────────────────────────────────────────────────
+
+if [[ $# -lt 1 ]]; then
+ echo "Usage: $0 [skill_install_dir]" >&2
+ exit 1
+fi
+
+TASK_DIR="$(realpath "$1")"
+SKILL_INSTALL_DIR="$(realpath "${2:-/mnt/shared-storage-user/llmit/user/sunyanan/openclaw/runtime_skills}")"
+
+if [[ ! -d "$TASK_DIR" ]]; then
+ echo "Error: task_dir '$TASK_DIR' does not exist." >&2
+ exit 1
+fi
+
+echo "=== task_initialize ==="
+echo "Task dir : $TASK_DIR"
+echo "Skill dir : $SKILL_INSTALL_DIR"
+echo
+
+# ── Step 1: Run env_spec.sh ───────────────────────────────────────────────────
+
+ENV_SCRIPT="$TASK_DIR/scripts/env_spec.sh"
+
+echo "--- Step 1: Running env_spec.sh ---"
+if [[ -f "$ENV_SCRIPT" ]]; then
+ bash "$ENV_SCRIPT"
+ echo "env_spec.sh completed."
+else
+ echo "WARNING: $ENV_SCRIPT not found, skipping environment setup."
+fi
+echo
+
+# ── Step 2: Install tools/ plugins into OpenClaw ─────────────────────────────
+
+TOOLS_DIR="$TASK_DIR/tools"
+
+echo "--- Step 2: Installing tool plugins ---"
+if [[ -d "$TOOLS_DIR" ]]; then
+ TOOL_COUNT=0
+ for tool_path in "$TOOLS_DIR"/*/; do
+ if [[ -d "$tool_path" ]]; then
+ tool_name="$(basename "$tool_path")"
+ echo " Installing plugin: $tool_name ($tool_path)"
+ openclaw plugins install --link "$tool_path"
+ TOOL_COUNT=$((TOOL_COUNT + 1))
+ fi
+ done
+ if [[ $TOOL_COUNT -eq 0 ]]; then
+ echo " No custom tool plugins found in $TOOLS_DIR — skipping."
+ else
+ echo " Installed $TOOL_COUNT plugin(s)."
+ fi
+else
+ echo " tools/ directory not found — skipping plugin installation."
+fi
+echo
+
+# ── Step 3: Install the source skill ─────────────────────────────────────────
+
+# Looks up the skill's relative_path from the skill catalog JSONL, then copies
+# the skill directory (parent of SKILL.md) to .
+
+SKILL_CATALOG="/mnt/shared-storage-user/llmit/user/tangyinhao/skills/data/awesome_openclaw_skills_and_skillshtop100_tag_v01.jsonl"
+SKILL_ROOT="/mnt/shared-storage-user/llmit/user/tangyinhao/skills"
+
+echo "--- Step 3: Installing source skill ---"
+
+# Extract skill_id from task.md frontmatter (skill_set: [skill-id])
+TASK_MD="$TASK_DIR/task.md"
+if [[ ! -f "$TASK_MD" ]]; then
+ echo " WARNING: task.md not found at $TASK_MD — skipping skill installation."
+else
+ SKILL_ID=$(grep -A1 'skill_set:' "$TASK_MD" | grep '^\s*-' | head -1 | sed 's/.*-\s*//' | tr -d '[:space:]')
+ echo " Detected skill_id from task.md: '$SKILL_ID'"
+
+ if [[ -z "$SKILL_ID" || "$SKILL_ID" == "null" ]]; then
+ echo " WARNING: Could not parse skill_id from task.md — skipping skill installation."
+ elif [[ ! -f "$SKILL_CATALOG" ]]; then
+ echo " WARNING: Skill catalog not found at $SKILL_CATALOG — skipping skill installation."
+ else
+ # Look up relative_path by id in the JSONL catalog
+ RELATIVE_PATH=$(python3 - < $DEST"
+ mkdir -p "$SKILL_INSTALL_DIR"
+ cp -r "$SKILL_SOURCE_DIR" "$DEST"
+ echo " Skill installed to: $DEST"
+ fi
+ fi
+ fi
+fi
+echo
+
+echo "=== task_initialize complete ==="
diff --git a/workspace/skills/tmux/SKILL.md b/workspace/skills/tmux/SKILL.md
new file mode 100644
index 00000000..19d7f710
--- /dev/null
+++ b/workspace/skills/tmux/SKILL.md
@@ -0,0 +1,121 @@
+---
+name: tmux
+description: Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
+metadata: {"internclaw":{"emoji":"🧵","os":["darwin","linux"],"requires":{"bins":["tmux"]}}}
+---
+
+# tmux Skill
+
+Use tmux only when you need an interactive TTY. Prefer exec background mode for long-running, non-interactive tasks.
+
+## Quickstart (isolated socket, exec tool)
+
+```bash
+SOCKET_DIR="${internclaw_TMUX_SOCKET_DIR:-${TMPDIR:-/tmp}/internclaw-tmux-sockets}"
+mkdir -p "$SOCKET_DIR"
+SOCKET="$SOCKET_DIR/internclaw.sock"
+SESSION=internclaw-python
+
+tmux -S "$SOCKET" new -d -s "$SESSION" -n shell
+tmux -S "$SOCKET" send-keys -t "$SESSION":0.0 -- 'PYTHON_BASIC_REPL=1 python3 -q' Enter
+tmux -S "$SOCKET" capture-pane -p -J -t "$SESSION":0.0 -S -200
+```
+
+After starting a session, always print monitor commands:
+
+```
+To monitor:
+ tmux -S "$SOCKET" attach -t "$SESSION"
+ tmux -S "$SOCKET" capture-pane -p -J -t "$SESSION":0.0 -S -200
+```
+
+## Socket convention
+
+- Use `internclaw_TMUX_SOCKET_DIR` environment variable.
+- Default socket path: `"$internclaw_TMUX_SOCKET_DIR/internclaw.sock"`.
+
+## Targeting panes and naming
+
+- Target format: `session:window.pane` (defaults to `:0.0`).
+- Keep names short; avoid spaces.
+- Inspect: `tmux -S "$SOCKET" list-sessions`, `tmux -S "$SOCKET" list-panes -a`.
+
+## Finding sessions
+
+- List sessions on your socket: `{baseDir}/scripts/find-sessions.sh -S "$SOCKET"`.
+- Scan all sockets: `{baseDir}/scripts/find-sessions.sh --all` (uses `internclaw_TMUX_SOCKET_DIR`).
+
+## Sending input safely
+
+- Prefer literal sends: `tmux -S "$SOCKET" send-keys -t target -l -- "$cmd"`.
+- Control keys: `tmux -S "$SOCKET" send-keys -t target C-c`.
+
+## Watching output
+
+- Capture recent history: `tmux -S "$SOCKET" capture-pane -p -J -t target -S -200`.
+- Wait for prompts: `{baseDir}/scripts/wait-for-text.sh -t session:0.0 -p 'pattern'`.
+- Attaching is OK; detach with `Ctrl+b d`.
+
+## Spawning processes
+
+- For python REPLs, set `PYTHON_BASIC_REPL=1` (non-basic REPL breaks send-keys flows).
+
+## Windows / WSL
+
+- tmux is supported on macOS/Linux. On Windows, use WSL and install tmux inside WSL.
+- This skill is gated to `darwin`/`linux` and requires `tmux` on PATH.
+
+## Orchestrating Coding Agents (Codex, Claude Code)
+
+tmux excels at running multiple coding agents in parallel:
+
+```bash
+SOCKET="${TMPDIR:-/tmp}/codex-army.sock"
+
+# Create multiple sessions
+for i in 1 2 3 4 5; do
+ tmux -S "$SOCKET" new-session -d -s "agent-$i"
+done
+
+# Launch agents in different workdirs
+tmux -S "$SOCKET" send-keys -t agent-1 "cd /tmp/project1 && codex --yolo 'Fix bug X'" Enter
+tmux -S "$SOCKET" send-keys -t agent-2 "cd /tmp/project2 && codex --yolo 'Fix bug Y'" Enter
+
+# Poll for completion (check if prompt returned)
+for sess in agent-1 agent-2; do
+ if tmux -S "$SOCKET" capture-pane -p -t "$sess" -S -3 | grep -q "❯"; then
+ echo "$sess: DONE"
+ else
+ echo "$sess: Running..."
+ fi
+done
+
+# Get full output from completed session
+tmux -S "$SOCKET" capture-pane -p -t agent-1 -S -500
+```
+
+**Tips:**
+- Use separate git worktrees for parallel fixes (no branch conflicts)
+- `pnpm install` first before running codex in fresh clones
+- Check for shell prompt (`❯` or `$`) to detect completion
+- Codex needs `--yolo` or `--full-auto` for non-interactive fixes
+
+## Cleanup
+
+- Kill a session: `tmux -S "$SOCKET" kill-session -t "$SESSION"`.
+- Kill all sessions on a socket: `tmux -S "$SOCKET" list-sessions -F '#{session_name}' | xargs -r -n1 tmux -S "$SOCKET" kill-session -t`.
+- Remove everything on the private socket: `tmux -S "$SOCKET" kill-server`.
+
+## Helper: wait-for-text.sh
+
+`{baseDir}/scripts/wait-for-text.sh` polls a pane for a regex (or fixed string) with a timeout.
+
+```bash
+{baseDir}/scripts/wait-for-text.sh -t session:0.0 -p 'pattern' [-F] [-T 20] [-i 0.5] [-l 2000]
+```
+
+- `-t`/`--target` pane target (required)
+- `-p`/`--pattern` regex to match (required); add `-F` for fixed string
+- `-T` timeout seconds (integer, default 15)
+- `-i` poll interval seconds (default 0.5)
+- `-l` history lines to search (integer, default 1000)
diff --git a/workspace/skills/tmux/scripts/find-sessions.sh b/workspace/skills/tmux/scripts/find-sessions.sh
new file mode 100755
index 00000000..2789c3be
--- /dev/null
+++ b/workspace/skills/tmux/scripts/find-sessions.sh
@@ -0,0 +1,112 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+ cat <<'USAGE'
+Usage: find-sessions.sh [-L socket-name|-S socket-path|-A] [-q pattern]
+
+List tmux sessions on a socket (default tmux socket if none provided).
+
+Options:
+ -L, --socket tmux socket name (passed to tmux -L)
+ -S, --socket-path tmux socket path (passed to tmux -S)
+ -A, --all scan all sockets under internclaw_TMUX_SOCKET_DIR
+ -q, --query case-insensitive substring to filter session names
+ -h, --help show this help
+USAGE
+}
+
+socket_name=""
+socket_path=""
+query=""
+scan_all=false
+socket_dir="${internclaw_TMUX_SOCKET_DIR:-${TMPDIR:-/tmp}/internclaw-tmux-sockets}"
+
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ -L|--socket) socket_name="${2-}"; shift 2 ;;
+ -S|--socket-path) socket_path="${2-}"; shift 2 ;;
+ -A|--all) scan_all=true; shift ;;
+ -q|--query) query="${2-}"; shift 2 ;;
+ -h|--help) usage; exit 0 ;;
+ *) echo "Unknown option: $1" >&2; usage; exit 1 ;;
+ esac
+done
+
+if [[ "$scan_all" == true && ( -n "$socket_name" || -n "$socket_path" ) ]]; then
+ echo "Cannot combine --all with -L or -S" >&2
+ exit 1
+fi
+
+if [[ -n "$socket_name" && -n "$socket_path" ]]; then
+ echo "Use either -L or -S, not both" >&2
+ exit 1
+fi
+
+if ! command -v tmux >/dev/null 2>&1; then
+ echo "tmux not found in PATH" >&2
+ exit 1
+fi
+
+list_sessions() {
+ local label="$1"; shift
+ local tmux_cmd=(tmux "$@")
+
+ if ! sessions="$("${tmux_cmd[@]}" list-sessions -F '#{session_name}\t#{session_attached}\t#{session_created_string}' 2>/dev/null)"; then
+ echo "No tmux server found on $label" >&2
+ return 1
+ fi
+
+ if [[ -n "$query" ]]; then
+ sessions="$(printf '%s\n' "$sessions" | grep -i -- "$query" || true)"
+ fi
+
+ if [[ -z "$sessions" ]]; then
+ echo "No sessions found on $label"
+ return 0
+ fi
+
+ echo "Sessions on $label:"
+ printf '%s\n' "$sessions" | while IFS=$'\t' read -r name attached created; do
+ attached_label=$([[ "$attached" == "1" ]] && echo "attached" || echo "detached")
+ printf ' - %s (%s, started %s)\n' "$name" "$attached_label" "$created"
+ done
+}
+
+if [[ "$scan_all" == true ]]; then
+ if [[ ! -d "$socket_dir" ]]; then
+ echo "Socket directory not found: $socket_dir" >&2
+ exit 1
+ fi
+
+ shopt -s nullglob
+ sockets=("$socket_dir"/*)
+ shopt -u nullglob
+
+ if [[ "${#sockets[@]}" -eq 0 ]]; then
+ echo "No sockets found under $socket_dir" >&2
+ exit 1
+ fi
+
+ exit_code=0
+ for sock in "${sockets[@]}"; do
+ if [[ ! -S "$sock" ]]; then
+ continue
+ fi
+ list_sessions "socket path '$sock'" -S "$sock" || exit_code=$?
+ done
+ exit "$exit_code"
+fi
+
+tmux_cmd=(tmux)
+socket_label="default socket"
+
+if [[ -n "$socket_name" ]]; then
+ tmux_cmd+=(-L "$socket_name")
+ socket_label="socket name '$socket_name'"
+elif [[ -n "$socket_path" ]]; then
+ tmux_cmd+=(-S "$socket_path")
+ socket_label="socket path '$socket_path'"
+fi
+
+list_sessions "$socket_label" "${tmux_cmd[@]:1}"
diff --git a/workspace/skills/tmux/scripts/wait-for-text.sh b/workspace/skills/tmux/scripts/wait-for-text.sh
new file mode 100755
index 00000000..56354be8
--- /dev/null
+++ b/workspace/skills/tmux/scripts/wait-for-text.sh
@@ -0,0 +1,83 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+ cat <<'USAGE'
+Usage: wait-for-text.sh -t target -p pattern [options]
+
+Poll a tmux pane for text and exit when found.
+
+Options:
+ -t, --target tmux target (session:window.pane), required
+ -p, --pattern regex pattern to look for, required
+ -F, --fixed treat pattern as a fixed string (grep -F)
+ -T, --timeout seconds to wait (integer, default: 15)
+ -i, --interval poll interval in seconds (default: 0.5)
+ -l, --lines number of history lines to inspect (integer, default: 1000)
+ -h, --help show this help
+USAGE
+}
+
+target=""
+pattern=""
+grep_flag="-E"
+timeout=15
+interval=0.5
+lines=1000
+
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ -t|--target) target="${2-}"; shift 2 ;;
+ -p|--pattern) pattern="${2-}"; shift 2 ;;
+ -F|--fixed) grep_flag="-F"; shift ;;
+ -T|--timeout) timeout="${2-}"; shift 2 ;;
+ -i|--interval) interval="${2-}"; shift 2 ;;
+ -l|--lines) lines="${2-}"; shift 2 ;;
+ -h|--help) usage; exit 0 ;;
+ *) echo "Unknown option: $1" >&2; usage; exit 1 ;;
+ esac
+done
+
+if [[ -z "$target" || -z "$pattern" ]]; then
+ echo "target and pattern are required" >&2
+ usage
+ exit 1
+fi
+
+if ! [[ "$timeout" =~ ^[0-9]+$ ]]; then
+ echo "timeout must be an integer number of seconds" >&2
+ exit 1
+fi
+
+if ! [[ "$lines" =~ ^[0-9]+$ ]]; then
+ echo "lines must be an integer" >&2
+ exit 1
+fi
+
+if ! command -v tmux >/dev/null 2>&1; then
+ echo "tmux not found in PATH" >&2
+ exit 1
+fi
+
+# End time in epoch seconds (integer, good enough for polling)
+start_epoch=$(date +%s)
+deadline=$((start_epoch + timeout))
+
+while true; do
+ # -J joins wrapped lines, -S uses negative index to read last N lines
+ pane_text="$(tmux capture-pane -p -J -t "$target" -S "-${lines}" 2>/dev/null || true)"
+
+ if printf '%s\n' "$pane_text" | grep $grep_flag -- "$pattern" >/dev/null 2>&1; then
+ exit 0
+ fi
+
+ now=$(date +%s)
+ if (( now >= deadline )); then
+ echo "Timed out after ${timeout}s waiting for pattern: $pattern" >&2
+ echo "Last ${lines} lines from $target:" >&2
+ printf '%s\n' "$pane_text" >&2
+ exit 1
+ fi
+
+ sleep "$interval"
+done
diff --git a/workspace/skills/weather/SKILL.md b/workspace/skills/weather/SKILL.md
new file mode 100644
index 00000000..6a5a0569
--- /dev/null
+++ b/workspace/skills/weather/SKILL.md
@@ -0,0 +1,49 @@
+---
+name: weather
+description: Get current weather and forecasts (no API key required).
+homepage: https://wttr.in/:help
+metadata: {"internclaw":{"emoji":"🌤️","requires":{"bins":["curl"]}}}
+---
+
+# Weather
+
+Two free services, no API keys needed.
+
+## wttr.in (primary)
+
+Quick one-liner:
+```bash
+curl -s "wttr.in/London?format=3"
+# Output: London: ⛅️ +8°C
+```
+
+Compact format:
+```bash
+curl -s "wttr.in/London?format=%l:+%c+%t+%h+%w"
+# Output: London: ⛅️ +8°C 71% ↙5km/h
+```
+
+Full forecast:
+```bash
+curl -s "wttr.in/London?T"
+```
+
+Format codes: `%c` condition · `%t` temp · `%h` humidity · `%w` wind · `%l` location · `%m` moon
+
+Tips:
+- URL-encode spaces: `wttr.in/New+York`
+- Airport codes: `wttr.in/JFK`
+- Units: `?m` (metric) `?u` (USCS)
+- Today only: `?1` · Current only: `?0`
+- PNG: `curl -s "wttr.in/Berlin.png" -o /tmp/weather.png`
+
+## Open-Meteo (fallback, JSON)
+
+Free, no key, good for programmatic use:
+```bash
+curl -s "https://api.open-meteo.com/v1/forecast?latitude=51.5&longitude=-0.12¤t_weather=true"
+```
+
+Find coordinates for a city, then query. Returns JSON with temp, windspeed, weathercode.
+
+Docs: https://open-meteo.com/en/docs