add skills
This commit is contained in:
@@ -0,0 +1,280 @@
|
||||
---
|
||||
name: pi-history-ingest
|
||||
description: >
|
||||
Ingest Pi coding agent session history into the Obsidian wiki. Use this skill when the user wants to mine
|
||||
their past Pi sessions for knowledge, import their ~/.pi/agent/sessions folder, extract insights from
|
||||
previous coding sessions, or says things like "process my Pi history", "add my Pi sessions to the wiki",
|
||||
"ingest ~/.pi", or "what have I worked on in Pi". Also triggers when the user mentions Pi sessions,
|
||||
Pi agent history, ~/.pi/agent/sessions, or Pi conversation logs.
|
||||
---
|
||||
|
||||
# Pi History Ingest — Session Mining
|
||||
|
||||
You are extracting knowledge from the user's Pi coding agent sessions and distilling it into the Obsidian wiki. Pi sessions are stored as structured JSONL with a tree layout — your job is to follow the active branch, extract durable knowledge, and compile it.
|
||||
|
||||
This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest pi`).
|
||||
|
||||
## Before You Start
|
||||
|
||||
1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `PI_HISTORY_PATH` (defaults to `~/.pi/agent/sessions`)
|
||||
2. Read `.manifest.json` at the vault root to check what has already been ingested
|
||||
3. Read `index.md` at the vault root to understand what the wiki already contains
|
||||
|
||||
## Ingest Modes
|
||||
|
||||
### Append Mode (default)
|
||||
|
||||
Check `.manifest.json` for each source file. Only process:
|
||||
|
||||
- Files not in the manifest (new sessions)
|
||||
- Files whose modification time is newer than `ingested_at` in the manifest
|
||||
|
||||
Use this mode for regular syncs.
|
||||
|
||||
### Full Mode
|
||||
|
||||
Process everything regardless of manifest. Use after `wiki-rebuild` or if the user explicitly asks for a full re-ingest.
|
||||
|
||||
## Pi Data Layout
|
||||
|
||||
Pi stores sessions under `~/.pi/agent/sessions/` (or the path set by `PI_CODING_AGENT_SESSION_DIR`).
|
||||
|
||||
```
|
||||
~/.pi/agent/sessions/
|
||||
├── --<cwd-path>--/ # Working directory with / replaced by -
|
||||
│ └── <timestamp>_<uuid>.jsonl # Session JSONL file
|
||||
└── ...
|
||||
```
|
||||
|
||||
The session filename contains an ISO timestamp and UUID. The parent directory encodes the working directory where the session was created.
|
||||
|
||||
### Session JSONL Format
|
||||
|
||||
Each `.jsonl` file is a sequence of JSON objects. The first line is always a `session` header; subsequent lines are tree entries with `id` and `parentId`.
|
||||
|
||||
Key entry types:
|
||||
|
||||
| `type` | Purpose | Ingest? |
|
||||
|---|---|---|
|
||||
| `session` | Header with `cwd`, `version`, `id`, `timestamp` | Metadata only |
|
||||
| `message` | Conversation turn (`user`, `assistant`, `toolResult`, `bashExecution`, etc.) | **Primary source** |
|
||||
| `session_info` | Display name set via `/name` | For session title |
|
||||
| `compaction` | Context compaction summary | **High signal** |
|
||||
| `branch_summary` | Summary when switching branches via `/tree` | **High signal** |
|
||||
| `model_change` | Model switch event | Skip |
|
||||
| `thinking_level_change` | Thinking level change | Skip |
|
||||
| `custom` | Extension state (not in LLM context) | Skip |
|
||||
| `custom_message` | Extension-injected message | Context only |
|
||||
| `label` | User bookmark/label | Skip |
|
||||
|
||||
### Message roles inside `message` entries
|
||||
|
||||
- `user` — user input; `content` is string or `(TextContent \| ImageContent)[]`
|
||||
- `assistant` — assistant response; `content` is `(TextContent \| ThinkingContent \| ToolCall)[]`
|
||||
- `toolResult` — tool execution result; `content` is `(TextContent \| ImageContent)[]`
|
||||
- `bashExecution` — bash command + output; `command`, `output`, `exitCode`
|
||||
- `branchSummary` — branch switch summary; `summary` string
|
||||
- `compactionSummary` — compaction summary; `summary` string
|
||||
|
||||
### Key data sources ranked by value
|
||||
|
||||
1. **`message` entries (`user` + `assistant`)** — full conversation transcripts; rich but noisy
|
||||
2. **`compaction` entries** — pre-synthesized summaries of older context; gold
|
||||
3. **`branch_summary` entries** — summaries of abandoned branches; good signal
|
||||
4. **`bashExecution` entries** — concrete commands run; useful for workflow patterns
|
||||
5. **`session_info` entries** — session name for topic inference
|
||||
|
||||
Skip `model_change`, `thinking_level_change`, `custom` (extension state), and `label` entries.
|
||||
|
||||
## Step 1: Survey and Compute Delta
|
||||
|
||||
Scan `PI_HISTORY_PATH` and compare against `.manifest.json`:
|
||||
|
||||
```bash
|
||||
# List all session files
|
||||
find ~/.pi/agent/sessions -name "*.jsonl" -type f
|
||||
|
||||
# Or with custom path
|
||||
find "$PI_HISTORY_PATH" -name "*.jsonl" -type f
|
||||
```
|
||||
|
||||
Build an inventory. For each session file, record:
|
||||
- `path` — absolute path
|
||||
- `cwd` — decoded from parent directory name (`--<path>--` → `/path`)
|
||||
- `session_name` — from the latest `session_info` entry (if any)
|
||||
- `modified_at` — file mtime
|
||||
- `already_ingested` — presence in `.manifest.json`
|
||||
|
||||
Classify each file:
|
||||
- **New** — not in manifest
|
||||
- **Modified** — in manifest but file is newer than `ingested_at`
|
||||
- **Unchanged** — already ingested and unchanged
|
||||
|
||||
Report a concise delta summary before deep parsing:
|
||||
> "Found N Pi sessions across K projects. Delta: X new, Y modified."
|
||||
|
||||
## Step 2: Parse Session JSONL
|
||||
|
||||
For each selected session file, read it line by line. Because sessions use a tree structure, build the active branch first:
|
||||
|
||||
1. Parse all entries into a map by `id`
|
||||
2. Find the current leaf (the entry with no children, or the last `message` entry)
|
||||
3. Walk `parentId` chain from leaf to root to get the active path
|
||||
4. Reverse the path so it's chronological
|
||||
|
||||
### Extraction rules
|
||||
|
||||
From the active path, extract:
|
||||
|
||||
- **`session` header** — `cwd`, `timestamp`, `parentSession` (if forked)
|
||||
- **`session_info`** — `name` field for session title/topic inference
|
||||
- **`message` entries with `role: "user"`** — extract `content` text (skip images)
|
||||
- **`message` entries with `role: "assistant"`** — extract `text` content blocks; skip `thinking` blocks (noise); note `toolCall` blocks (they reveal what the agent actually did)
|
||||
- **`message` entries with `role: "toolResult"`** — summarize outcomes, not full output
|
||||
- **`message` entries with `role: "bashExecution"`** — extract command + exit code; recurring commands reveal build/test/deploy workflows
|
||||
- **`compaction` entries** — read `summary` verbatim; it's already distilled
|
||||
- **`branch_summary` entries** — read `summary` verbatim; captures abandoned approaches
|
||||
|
||||
### Skip / noise filters
|
||||
|
||||
- `thinking` content blocks — internal reasoning, not durable knowledge
|
||||
- Image content blocks — skip unless the user explicitly asks for image transcription
|
||||
- Raw tool outputs longer than 500 chars — summarize the outcome
|
||||
- Token accounting (`usage` fields) — metadata only
|
||||
- Repeated plan echoes or status updates
|
||||
|
||||
### Critical privacy filter
|
||||
|
||||
Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
|
||||
|
||||
- Remove API keys, tokens, passwords, credentials
|
||||
- Redact private identifiers unless relevant and user-approved
|
||||
- Summarize bash outputs that contain paths, environment variables, or secrets
|
||||
- Do not quote raw `toolCall` arguments verbatim if they contain sensitive data
|
||||
|
||||
## Step 3: Cluster by Topic
|
||||
|
||||
Do not create one wiki page per session.
|
||||
|
||||
- Group knowledge by stable topic across many sessions
|
||||
- Split mixed sessions into separate themes
|
||||
- Merge recurring patterns across dates and projects
|
||||
- Use the `cwd` from the session header to infer project scope
|
||||
- Use `session_info.name` as a topic hint when available
|
||||
|
||||
## Step 4: Distill into Wiki Pages
|
||||
|
||||
Route extracted knowledge using existing wiki conventions:
|
||||
|
||||
- Project-specific architecture/process → `projects/<name>/...`
|
||||
- General concepts → `concepts/`
|
||||
- Recurring techniques/debug playbooks → `skills/`
|
||||
- Tools/services/frameworks → `entities/`
|
||||
- Cross-session patterns → `synthesis/`
|
||||
|
||||
For each impacted project, create/update `projects/<name>/<name>.md`.
|
||||
|
||||
### Writing rules
|
||||
|
||||
- Distill knowledge, not chronology
|
||||
- Avoid "on date X we discussed..." unless date context is essential
|
||||
- Add `summary:` frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
|
||||
- Add confidence and lifecycle fields to every new page:
|
||||
```yaml
|
||||
base_confidence: 0.42
|
||||
lifecycle: draft
|
||||
lifecycle_changed: <ISO date today>
|
||||
```
|
||||
Leave `lifecycle` unchanged on update.
|
||||
- Add provenance markers:
|
||||
- `^[extracted]` when directly grounded in explicit session content (compaction/branch summaries, explicit assistant statements)
|
||||
- `^[inferred]` when synthesizing patterns across multiple sessions or inferring from tool calls
|
||||
- `^[ambiguous]` when sessions conflict or a compaction summary contradicts later turns
|
||||
- Add/update `provenance:` frontmatter mix for each changed page
|
||||
|
||||
**Mark provenance** per the convention in `llm-wiki`:
|
||||
- `compaction` and `branch_summary` entries are pre-distilled — treat as mostly `^[extracted]`
|
||||
- Conversation distillation is mostly `^[inferred]` — you're synthesizing from dialogue
|
||||
- Use `^[ambiguous]` when the user changed their mind across sessions or when compaction summaries disagree with later conversation turns
|
||||
|
||||
## Step 5: Update Manifest, Log, and Index
|
||||
|
||||
### Update `.manifest.json`
|
||||
|
||||
For each processed source file:
|
||||
|
||||
- `ingested_at`, `size_bytes`, `modified_at`
|
||||
- `source_type`: `pi_session`
|
||||
- `project`: inferred project name from decoded `cwd`
|
||||
- `pages_created`, `pages_updated`
|
||||
|
||||
Add/update a top-level summary block:
|
||||
|
||||
```json
|
||||
{
|
||||
"pi": {
|
||||
"source_path": "~/.pi/agent/sessions/",
|
||||
"last_ingested": "TIMESTAMP",
|
||||
"sessions_ingested": 12,
|
||||
"sessions_total": 40,
|
||||
"pages_created": 5,
|
||||
"pages_updated": 12
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Update special files
|
||||
|
||||
Update `index.md` and `log.md`:
|
||||
|
||||
```
|
||||
- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full
|
||||
```
|
||||
|
||||
**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested 12 Pi sessions across 3 projects; surfaced patterns in CLI tooling and API design." Keep the last 3 operations. Update `updated` timestamp.
|
||||
|
||||
## Privacy and Compliance
|
||||
|
||||
- Distill and synthesize; avoid raw transcript dumps
|
||||
- Default to redaction for anything that looks sensitive
|
||||
- Ask the user before storing personal or sensitive details
|
||||
- Keep references to other people minimal and purpose-bound
|
||||
|
||||
## Reference
|
||||
|
||||
See `references/pi-data-format.md` for field-level parsing notes and extraction guidance.
|
||||
|
||||
## QMD Refresh After Vault Writes
|
||||
|
||||
QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
|
||||
|
||||
Use `$QMD_CLI` if set; otherwise use `qmd`.
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} update
|
||||
```
|
||||
|
||||
If the output says vectors are needed or embeddings may be stale, run:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} embed
|
||||
```
|
||||
|
||||
Verify the collection with either:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
|
||||
```
|
||||
|
||||
or, when a specific page path is known:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
|
||||
```
|
||||
|
||||
Record one of:
|
||||
- `QMD refreshed: update + embed + verified`
|
||||
- `QMD refreshed: update only + verified`
|
||||
- `QMD skipped: QMD_WIKI_COLLECTION unset`
|
||||
- `QMD skipped: qmd CLI unavailable`
|
||||
- `QMD failed: <short error summary>`
|
||||
Reference in New Issue
Block a user