add skills

2026-05-28 10:17:41 +09:00
parent c526552957
commit 51715816fd
81 changed files with 17935 additions and 2 deletions
@@ -0,0 +1,254 @@
+---
+name: openclaw-history-ingest
+description: >
+  Ingest OpenClaw agent history into the Obsidian wiki. Use this skill when the user wants to mine
+  their past OpenClaw sessions for knowledge, import their ~/.openclaw folder, extract insights from
+  previous OpenClaw conversations, or says things like "process my OpenClaw history", "add my OpenClaw
+  sessions to the wiki", "ingest ~/.openclaw", or "what have I worked on in OpenClaw". Also triggers
+  when the user mentions OpenClaw session logs, MEMORY.md, daily notes, or ~/.openclaw/workspace.
+---
+
+# OpenClaw History Ingest — Session & Memory Mining
+
+You are extracting knowledge from the user's OpenClaw agent history and distilling it into the Obsidian wiki. OpenClaw stores both a structured long-term MEMORY.md and per-session JSONL transcripts — focus on durable knowledge, not operational telemetry.
+
+This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest openclaw`).
+
+## Before You Start
+
+1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `OPENCLAW_HISTORY_PATH` (defaults to `~/.openclaw`)
+2. Read `.manifest.json` at the vault root to check what has already been ingested
+3. Read `index.md` at the vault root to understand what the wiki already contains
+
+## Ingest Modes
+
+### Append Mode (default)
+
+Check `.manifest.json` for each source file. Only process:
+
+- Files not in the manifest (new session logs, updated MEMORY.md or daily notes)
+- Files whose modification time is newer than `ingested_at` in the manifest
+
+Use this mode for regular syncs.
+
+### Full Mode
+
+Process everything regardless of manifest. Use after `wiki-rebuild` or if the user explicitly asks for a full re-ingest.
+
+## OpenClaw Data Layout
+
+OpenClaw stores all local artifacts under `~/.openclaw/`.
+
+```
+~/.openclaw/
+├── openclaw.json                          # Global config
+├── credentials/                           # Auth tokens (skip entirely)
+├── workspace/                             # Agent workspace
+│   ├── MEMORY.md                          # Long-term memory (loaded every session)
+│   ├── DREAMS.md                          # Optional dream diary / summaries
+│   └── memory/
+│       ├── YYYY-MM-DD.md                  # Daily notes (today + yesterday auto-loaded)
+│       └── ...
+└── agents/
+    └── <agentId>/
+        ├── agent/
+        │   └── models.json                # Agent config (skip)
+        └── sessions/
+            ├── sessions.json              # Session index
+            └── <sessionId>.jsonl          # Session transcript (JSONL, append-only)
+```
+
+### Key data sources ranked by value
+
+1. `workspace/MEMORY.md` — highest signal; long-term durable facts the agent accumulated
+2. `workspace/memory/YYYY-MM-DD.md` — daily notes; recent entries often contain active project context
+3. `agents/*/sessions/<id>.jsonl` — session transcripts; rich but noisy
+4. `agents/*/sessions/sessions.json` — session index for inventory and timestamps
+5. `workspace/DREAMS.md` — optional summaries; ingest if present
+
+Skip `credentials/` entirely. Skip `agents/*/agent/models.json` (runtime config, not user knowledge).
+
+## Step 1: Survey and Compute Delta
+
+Scan `OPENCLAW_HISTORY_PATH` and compare against `.manifest.json`:
+
+- `~/.openclaw/workspace/MEMORY.md`
+- `~/.openclaw/workspace/DREAMS.md` (if present)
+- `~/.openclaw/workspace/memory/*.md`
+- `~/.openclaw/agents/*/sessions/sessions.json`
+- `~/.openclaw/agents/*/sessions/*.jsonl`
+
+Classify each file:
+
+- **New** — not in manifest
+- **Modified** — in manifest but file is newer than `ingested_at`
+- **Unchanged** — already ingested and unchanged
+
+Report a concise delta summary before deep parsing.
+
+## Step 2: Parse MEMORY.md First
+
+`MEMORY.md` is the highest-value source. It is plain markdown, human-readable and human-editable. It typically contains:
+
+- Durable facts about the user's preferences, environment, and recurring patterns
+- Decisions and context the agent was told to remember
+- Project-specific notes the agent accumulated over many sessions
+
+Read it in full and extract concept-level knowledge. Do not create one wiki page per MEMORY.md entry — cluster by topic.
+
+## Step 3: Parse Daily Notes
+
+`workspace/memory/YYYY-MM-DD.md` files contain time-stamped notes from that day's sessions. Prioritize recent files (last 30–90 days). Extract:
+
+- Active project context and decisions made
+- Patterns or techniques discovered
+- Recurring blockers or solved problems
+
+Older daily notes have diminishing signal — summarize in bulk rather than extracting line-by-line.
+
+## Step 4: Parse Session JSONL Safely
+
+Each session file is JSONL (append-only, one JSON object per line):
+
+```json
+{"role": "user",      "content": "...", "timestamp": "..."}
+{"role": "assistant", "content": "...", "timestamp": "..."}
+{"role": "tool",      "name": "...",   "content": "...", "timestamp": "..."}
+```
+
+### Extraction rules
+
+- Prioritize assistant turns that state conclusions, decisions, or patterns
+- Extract user intent from high-signal turns; skip low-information follow-ups
+- Tool calls are context, not primary knowledge — only extract if the result contains a reusable insight
+- Cross-reference `sessions.json` index to get session names/labels before opening individual transcripts
+
+### Critical privacy filter
+
+Session transcripts can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
+
+- Remove API keys, tokens, passwords, credentials
+- Redact private identifiers unless relevant and user-approved
+- Summarize; do not quote raw transcripts verbatim
+
+## Step 5: Cluster by Topic
+
+Do not create one wiki page per session or per MEMORY.md entry.
+
+- Group by stable topic (concept, tool, project, technique)
+- Split mixed sessions into separate themes
+- Merge recurring patterns across dates and agents
+- Use session `cwd` or workspace path to infer project scope when available
+
+## Step 6: Distill into Wiki Pages
+
+Route extracted knowledge using existing wiki conventions:
+
+- Project-specific architecture/process → `projects/<name>/...`
+- General concepts → `concepts/`
+- Recurring techniques/debug playbooks → `skills/`
+- Tools/services/frameworks → `entities/`
+- Cross-session patterns → `synthesis/`
+
+For each impacted project, create/update `projects/<name>/<name>.md`.
+
+### Writing rules
+
+- Distill knowledge, not chronology
+- Avoid "on date X we discussed..." unless date context is essential
+- Add `summary:` frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
+- Add confidence and lifecycle fields to every new page:
+  ```yaml
+  base_confidence: 0.42
+  lifecycle: draft
+  lifecycle_changed: <ISO date today>
+  ```
+  Leave `lifecycle` unchanged on update.
+- Add provenance markers:
+  - `^[extracted]` when directly grounded in explicit session/memory content
+  - `^[inferred]` when synthesizing patterns across multiple sessions
+  - `^[ambiguous]` when sessions conflict
+- Add/update `provenance:` frontmatter mix for each changed page
+
+## Step 7: Update Manifest, Log, and Index
+
+### Update `.manifest.json`
+
+For each processed source file:
+
+- `ingested_at`, `size_bytes`, `modified_at`
+- `source_type`: `openclaw_memory` | `openclaw_daily_note` | `openclaw_session` | `openclaw_dreams`
+- `agent_id`: agent directory name (when applicable)
+- `pages_created`, `pages_updated`
+
+Add/update a top-level summary block:
+
+```json
+{
+  "openclaw": {
+    "source_path": "~/.openclaw/",
+    "last_ingested": "TIMESTAMP",
+    "memory_updated_at": "TIMESTAMP",
+    "daily_notes_ingested": 14,
+    "sessions_ingested": 23,
+    "pages_created": 6,
+    "pages_updated": 18
+  }
+}
+```
+
+### Update special files
+
+Update `index.md` and `log.md`:
+
+```
+- [TIMESTAMP] OPENCLAW_HISTORY_INGEST memory=updated daily_notes=N sessions=M pages_updated=X pages_created=Y mode=append|full
+```
+
+**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested OpenClaw MEMORY.md and 14 daily notes; surfaced automation patterns and multi-agent coordination knowledge." Keep the last 3 operations. Update `updated` timestamp.
+
+## Privacy and Compliance
+
+- Distill and synthesize; avoid raw memory or transcript dumps
+- Default to redaction for anything that looks sensitive
+- Ask the user before storing personal or sensitive details
+- Keep references to other people minimal and purpose-bound
+
+## Reference
+
+See `references/openclaw-data-format.md` for field-level notes and parsing guidance.
+
+## QMD Refresh After Vault Writes
+
+QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
+
+Use `$QMD_CLI` if set; otherwise use `qmd`.
+
+```bash
+${QMD_CLI:-qmd} update
+```
+
+If the output says vectors are needed or embeddings may be stale, run:
+
+```bash
+${QMD_CLI:-qmd} embed
+```
+
+Verify the collection with either:
+
+```bash
+${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
+```
+
+or, when a specific page path is known:
+
+```bash
+${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
+```
+
+Record one of:
+- `QMD refreshed: update + embed + verified`
+- `QMD refreshed: update only + verified`
+- `QMD skipped: QMD_WIKI_COLLECTION unset`
+- `QMD skipped: qmd CLI unavailable`
+- `QMD failed: <short error summary>`
@@ -0,0 +1,154 @@
+# OpenClaw Agent — Data Format Reference
+
+Field-level notes for parsing `~/.openclaw/` artifacts during wiki ingest.
+
+## Cache Root
+
+`~/.openclaw/` — all paths below are relative to this root.
+
+## workspace/MEMORY.md
+
+Plain markdown. No required frontmatter — structure varies by user and agent configuration. Typically looks like:
+
+```markdown
+# Memory
+
+## User Preferences
+- Prefers concise responses without trailing summaries
+- Uses pnpm over npm
+
+## Projects
+### my-api
+- FastAPI app, deployed on Fly.io
+- Uses Postgres via Supabase
+
+## Patterns
+- Debugging: always check logs before code changes
+```
+
+This is the single most valuable source in the entire `~/.openclaw/` tree. Read it fully before touching session logs.
+
+## workspace/memory/YYYY-MM-DD.md
+
+Daily note files. Auto-generated by OpenClaw at the start of each day. Format:
+
+```markdown
+# 2026-04-15
+
+## Session: my-api refactor
+- Rewrote auth middleware to use JWT instead of sessions
+- Decision: keep refresh tokens in httpOnly cookies
+
+## Session: obsidian-wiki
+- Added cross-linker skill
+- Fixed broken wikilinks in concepts/
+```
+
+Today's and yesterday's files are loaded into every session automatically. Files older than ~7 days have sharply diminishing signal.
+
+## workspace/DREAMS.md
+
+Optional. Some OpenClaw configurations generate end-of-day summaries here. Plain markdown. Treat as a lower-priority supplement to MEMORY.md — skim for novel insights not already captured in MEMORY.md.
+
+## agents/\<agentId\>/sessions/sessions.json
+
+Session index. JSON array:
+
+```json
+[
+  {
+    "id": "abc123",
+    "name": "my-api refactor",
+    "created_at": "2026-04-15T10:00:00Z",
+    "updated_at": "2026-04-15T12:30:00Z",
+    "message_count": 47,
+    "agent_id": "default"
+  }
+]
+```
+
+Use this to:
+- Build a session inventory before opening JSONL files
+- Prioritize by `updated_at` (most recent = highest signal)
+- Map session IDs to human-readable names
+
+## agents/\<agentId\>/sessions/\<sessionId\>.jsonl
+
+Per-session transcript. JSONL, append-only. One JSON object per line:
+
+**User turn:**
+```json
+{"role": "user", "content": "How do I debounce a React input?", "timestamp": "2026-04-15T10:01:00Z"}
+```
+
+**Assistant turn:**
+```json
+{"role": "assistant", "content": "Use useCallback + useEffect with a clearTimeout...", "timestamp": "2026-04-15T10:01:02Z"}
+```
+
+**Tool call:**
+```json
+{"role": "tool", "name": "read_file", "input": {"path": "/home/ubuntu/app/src/App.tsx"}, "timestamp": "2026-04-15T10:01:05Z"}
+```
+
+**Tool result:**
+```json
+{"role": "tool_result", "name": "read_file", "content": "...", "timestamp": "2026-04-15T10:01:05Z"}
+```
+
+`role` is the primary dispatch field. `timestamp` is ISO 8601. `content` may be a string or a structured object (for multi-part responses).
+
+## agents/\<agentId\>/sessions/\<sessionId\>-topic-\<threadId\>.jsonl
+
+Telegram topic variant — same schema as the base session JSONL. The `-topic-<threadId>` suffix identifies which Telegram thread generated the session. Parse identically to a regular session file.
+
+## openclaw.json
+
+Global config. Rarely useful for ingest. Fields of interest if needed:
+
+```json
+{
+  "agents": {
+    "defaults": {
+      "workspace": "~/.openclaw/workspace",
+      "bootstrapMaxChars": 20000
+    }
+  },
+  "skills": {
+    "load": {
+      "extraDirs": []
+    }
+  }
+}
+```
+
+`agents.defaults.workspace` is the canonical path for MEMORY.md and daily notes if non-default.
+
+## Bootstrap file priority (for reference)
+
+OpenClaw loads context files in this order at session start:
+
+| Priority | File | Notes |
+|---|---|---|
+| 10 | `AGENTS.md` | Always-on project instructions |
+| 20 | `SOUL.md` | Agent identity |
+| 30 | `IDENTITY.md` | Agent persona |
+| 40 | `USER.md` | User profile |
+| 50 | `TOOLS.md` | Tool config |
+| 60 | `BOOTSTRAP.md` | Custom bootstrap |
+| 70 | `MEMORY.md` | Long-term memory (workspace copy) |
+
+All files are truncated at `bootstrapMaxChars` (default 20,000 chars) per file.
+
+## Extraction Priority
+
+| Source | Signal | Noise |
+|---|---|---|
+| `workspace/MEMORY.md` | Very high — curated, durable | Very low |
+| `workspace/memory/YYYY-MM-DD.md` (recent) | High — active context | Low |
+| `workspace/DREAMS.md` | Medium — summaries | Low |
+| `workspace/memory/YYYY-MM-DD.md` (old) | Low — stale | Medium |
+| `sessions/*.jsonl` — assistant turns | Medium | Medium |
+| `sessions/*.jsonl` — tool pairs | Low | High |
+| `openclaw.json` | Very low | — |
+| `credentials/` | None — skip | — |