add skills

2026-05-28 10:17:41 +09:00
parent c526552957
commit 51715816fd
81 changed files with 17935 additions and 2 deletions
@@ -0,0 +1,245 @@
+---
+name: codex-history-ingest
+description: >
+  Ingest Codex CLI conversation history into the Obsidian wiki. Use this skill when the user wants to mine
+  their past Codex sessions for knowledge, import their ~/.codex folder, extract insights from previous coding
+  sessions, or says things like "process my Codex history", "add my Codex conversations to the wiki", or
+  "what have I discussed in Codex before". Also triggers when the user mentions .codex sessions, rollout files,
+  session_index.jsonl, or Codex transcript logs.
+---
+
+# Codex History Ingest — Conversation Mining
+
+You are extracting knowledge from the user's past Codex sessions and distilling it into the Obsidian wiki. Session logs are rich but noisy: focus on durable knowledge, not operational telemetry.
+
+This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest codex`).
+
+## Before You Start
+
+1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `CODEX_HISTORY_PATH` (defaults to `~/.codex`)
+2. Read `.manifest.json` at the vault root to check what has already been ingested
+3. Read `index.md` at the vault root to understand what the wiki already contains
+
+## Ingest Modes
+
+### Append Mode (default)
+
+Check `.manifest.json` for each source file. Only process:
+
+- Files not in the manifest (new session rollouts, new index files)
+- Files whose modification time is newer than `ingested_at` in the manifest
+
+Use this mode for regular syncs.
+
+### Full Mode
+
+Process everything regardless of manifest. Use after `wiki-rebuild` or if the user explicitly asks for a full re-ingest.
+
+## Codex Data Layout
+
+Codex stores local artifacts under `~/.codex/`.
+
+```
+~/.codex/
+├── sessions/                          # Session rollout logs by date
+│   └── YYYY/MM/DD/
+│       └── rollout-<timestamp>-<id>.jsonl
+├── archived_sessions/                 # Archived rollout logs
+├── session_index.jsonl                # Lightweight index of thread id/name/updated_at
+├── history.jsonl                      # Local transcript history (if persistence enabled)
+├── config.toml                        # User config (contains history settings)
+└── state_*.sqlite / logs_*.sqlite     # Runtime DBs (usually skip)
+```
+
+### Key data sources ranked by value
+
+1. `session_index.jsonl` — best inventory source for IDs, titles, and freshness
+2. `sessions/**/rollout-*.jsonl` — rich structured transcript events
+3. `history.jsonl` — useful fallback/timeline aid if enabled
+
+Avoid ingesting SQLite internals unless the user explicitly asks.
+
+## Step 1: Survey and Compute Delta
+
+Scan `CODEX_HISTORY_PATH` and compare against `.manifest.json`:
+
+- `~/.codex/session_index.jsonl`
+- `~/.codex/sessions/**/rollout-*.jsonl`
+- `~/.codex/archived_sessions/**` (optional; only if user asks for archived history)
+- `~/.codex/history.jsonl` (optional fallback)
+
+Classify each file:
+
+- **New** — not in manifest
+- **Modified** — in manifest but file is newer than `ingested_at`
+- **Unchanged** — already ingested and unchanged
+
+Report a concise delta summary before deep parsing.
+
+## Step 2: Parse Session Index First
+
+`session_index.jsonl` typically has entries like:
+
+```json
+{"id":"...","thread_name":"...","updated_at":"..."}
+```
+
+Use it to:
+
+- Build a canonical session inventory
+- Prioritize recent/high-signal sessions
+- Map rollout IDs to human-readable thread names
+
+## Step 3: Parse Rollout JSONL Safely
+
+Each `rollout-*.jsonl` line is an event envelope with:
+
+```json
+{
+  "timestamp": "...",
+  "type": "session_meta|turn_context|event_msg|response_item",
+  "payload": { ... }
+}
+```
+
+### Extraction rules
+
+- Prioritize user intent and assistant-visible outputs
+- Favor `response_item` records with user/assistant message content
+- Use `event_msg` selectively for meaningful milestones; ignore pure telemetry
+- Treat `session_meta` as metadata (cwd, model, ids), not user knowledge
+
+### Skip/noise filters
+
+- Token accounting events
+- Tool plumbing with no semantic content
+- Raw command output unless it contains reusable decisions/patterns
+- Repeated plan snapshots unless they add novel decisions
+
+### Critical privacy filter
+
+Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.
+
+- Remove API keys, tokens, passwords, credentials
+- Redact private identifiers unless relevant and approved
+- Summarize instead of quoting raw transcripts
+
+## Step 4: Cluster by Topic
+
+Do not create one wiki page per session.
+
+- Group by stable topics across many sessions
+- Split mixed sessions into separate themes
+- Merge recurring concepts across dates/projects
+- Use `cwd` from metadata to infer project scope
+
+## Step 5: Distill into Wiki Pages
+
+Route extracted knowledge using existing wiki conventions:
+
+- Project-specific architecture/process -> `projects/<name>/...`
+- General concepts -> `concepts/`
+- Recurring techniques/debug playbooks -> `skills/`
+- Tools/services -> `entities/`
+- Cross-session patterns -> `synthesis/`
+
+For each impacted project, create/update `projects/<name>/<name>.md` (project name as filename, never `_project.md`).
+
+### Writing rules
+
+- Distill knowledge, not chronology
+- Avoid "on date X we discussed..." unless date context is essential
+- Add `summary:` frontmatter on each new/updated page (1-2 sentences, <= 200 chars)
+- Add confidence and lifecycle fields to every new page:
+  ```yaml
+  base_confidence: 0.42
+  lifecycle: draft
+  lifecycle_changed: <ISO date today>
+  ```
+  Leave `lifecycle` unchanged on update.
+- Add provenance markers:
+  - `^[extracted]` when directly grounded in explicit session content
+  - `^[inferred]` when synthesizing patterns across events/sessions
+  - `^[ambiguous]` when sessions conflict
+- Add/update `provenance:` frontmatter mix for each changed page
+
+## Step 6: Update Manifest, Log, and Index
+
+### Update `.manifest.json`
+
+For each processed source file:
+
+- `ingested_at`, `size_bytes`, `modified_at`
+- `source_type`: `codex_rollout` | `codex_index` | `codex_history`
+- `project`: inferred project name (when applicable)
+- `pages_created`, `pages_updated`
+
+Add/update a top-level project/session summary block:
+
+```json
+{
+  "project-name": {
+    "source_path": "~/.codex/sessions/...",
+    "last_ingested": "TIMESTAMP",
+    "sessions_ingested": 12,
+    "sessions_total": 40,
+    "index_updated_at": "TIMESTAMP"
+  }
+}
+```
+
+### Update special files
+
+Update `index.md` and `log.md`:
+
+```
+- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full
+```
+
+**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested 12 Codex sessions; surfaced recurring patterns in CLI tooling and shell scripting." Keep the last 3 operations. Update `updated` timestamp.
+
+## Privacy and Compliance
+
+- Distill and synthesize; avoid raw transcript dumps
+- Default to redaction for anything that looks sensitive
+- Ask the user before storing personal/sensitive details
+- Keep references to other people minimal and purpose-bound
+
+## Reference
+
+See `references/codex-data-format.md` for field-level parsing notes and extraction guidance.
+
+## QMD Refresh After Vault Writes
+
+QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
+
+Use `$QMD_CLI` if set; otherwise use `qmd`.
+
+```bash
+${QMD_CLI:-qmd} update
+```
+
+If the output says vectors are needed or embeddings may be stale, run:
+
+```bash
+${QMD_CLI:-qmd} embed
+```
+
+Verify the collection with either:
+
+```bash
+${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
+```
+
+or, when a specific page path is known:
+
+```bash
+${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
+```
+
+Record one of:
+- `QMD refreshed: update + embed + verified`
+- `QMD refreshed: update only + verified`
+- `QMD skipped: QMD_WIKI_COLLECTION unset`
+- `QMD skipped: qmd CLI unavailable`
+- `QMD failed: <short error summary>`