Files
MultiPhysicsVault/.agents/skills/pi-history-ingest/SKILL.md
T
2026-05-28 10:17:41 +09:00

11 KiB
Raw Blame History

name, description
name description
pi-history-ingest Ingest Pi coding agent session history into the Obsidian wiki. Use this skill when the user wants to mine their past Pi sessions for knowledge, import their ~/.pi/agent/sessions folder, extract insights from previous coding sessions, or says things like "process my Pi history", "add my Pi sessions to the wiki", "ingest ~/.pi", or "what have I worked on in Pi". Also triggers when the user mentions Pi sessions, Pi agent history, ~/.pi/agent/sessions, or Pi conversation logs.

Pi History Ingest — Session Mining

You are extracting knowledge from the user's Pi coding agent sessions and distilling it into the Obsidian wiki. Pi sessions are stored as structured JSONL with a tree layout — your job is to follow the active branch, extract durable knowledge, and compile it.

This skill can be invoked directly or via the wiki-history-ingest router (/wiki-history-ingest pi).

Before You Start

  1. Resolve config — follow the Config Resolution Protocol in llm-wiki/SKILL.md (walk up CWD for .env~/.obsidian-wiki/config → prompt setup). This gives OBSIDIAN_VAULT_PATH and PI_HISTORY_PATH (defaults to ~/.pi/agent/sessions)
  2. Read .manifest.json at the vault root to check what has already been ingested
  3. Read index.md at the vault root to understand what the wiki already contains

Ingest Modes

Append Mode (default)

Check .manifest.json for each source file. Only process:

  • Files not in the manifest (new sessions)
  • Files whose modification time is newer than ingested_at in the manifest

Use this mode for regular syncs.

Full Mode

Process everything regardless of manifest. Use after wiki-rebuild or if the user explicitly asks for a full re-ingest.

Pi Data Layout

Pi stores sessions under ~/.pi/agent/sessions/ (or the path set by PI_CODING_AGENT_SESSION_DIR).

~/.pi/agent/sessions/
├── --<cwd-path>--/                    # Working directory with / replaced by -
│   └── <timestamp>_<uuid>.jsonl       # Session JSONL file
└── ...

The session filename contains an ISO timestamp and UUID. The parent directory encodes the working directory where the session was created.

Session JSONL Format

Each .jsonl file is a sequence of JSON objects. The first line is always a session header; subsequent lines are tree entries with id and parentId.

Key entry types:

type Purpose Ingest?
session Header with cwd, version, id, timestamp Metadata only
message Conversation turn (user, assistant, toolResult, bashExecution, etc.) Primary source
session_info Display name set via /name For session title
compaction Context compaction summary High signal
branch_summary Summary when switching branches via /tree High signal
model_change Model switch event Skip
thinking_level_change Thinking level change Skip
custom Extension state (not in LLM context) Skip
custom_message Extension-injected message Context only
label User bookmark/label Skip

Message roles inside message entries

  • user — user input; content is string or (TextContent \| ImageContent)[]
  • assistant — assistant response; content is (TextContent \| ThinkingContent \| ToolCall)[]
  • toolResult — tool execution result; content is (TextContent \| ImageContent)[]
  • bashExecution — bash command + output; command, output, exitCode
  • branchSummary — branch switch summary; summary string
  • compactionSummary — compaction summary; summary string

Key data sources ranked by value

  1. message entries (user + assistant) — full conversation transcripts; rich but noisy
  2. compaction entries — pre-synthesized summaries of older context; gold
  3. branch_summary entries — summaries of abandoned branches; good signal
  4. bashExecution entries — concrete commands run; useful for workflow patterns
  5. session_info entries — session name for topic inference

Skip model_change, thinking_level_change, custom (extension state), and label entries.

Step 1: Survey and Compute Delta

Scan PI_HISTORY_PATH and compare against .manifest.json:

# List all session files
find ~/.pi/agent/sessions -name "*.jsonl" -type f

# Or with custom path
find "$PI_HISTORY_PATH" -name "*.jsonl" -type f

Build an inventory. For each session file, record:

  • path — absolute path
  • cwd — decoded from parent directory name (--<path>--/path)
  • session_name — from the latest session_info entry (if any)
  • modified_at — file mtime
  • already_ingested — presence in .manifest.json

Classify each file:

  • New — not in manifest
  • Modified — in manifest but file is newer than ingested_at
  • Unchanged — already ingested and unchanged

Report a concise delta summary before deep parsing:

"Found N Pi sessions across K projects. Delta: X new, Y modified."

Step 2: Parse Session JSONL

For each selected session file, read it line by line. Because sessions use a tree structure, build the active branch first:

  1. Parse all entries into a map by id
  2. Find the current leaf (the entry with no children, or the last message entry)
  3. Walk parentId chain from leaf to root to get the active path
  4. Reverse the path so it's chronological

Extraction rules

From the active path, extract:

  • session headercwd, timestamp, parentSession (if forked)
  • session_infoname field for session title/topic inference
  • message entries with role: "user" — extract content text (skip images)
  • message entries with role: "assistant" — extract text content blocks; skip thinking blocks (noise); note toolCall blocks (they reveal what the agent actually did)
  • message entries with role: "toolResult" — summarize outcomes, not full output
  • message entries with role: "bashExecution" — extract command + exit code; recurring commands reveal build/test/deploy workflows
  • compaction entries — read summary verbatim; it's already distilled
  • branch_summary entries — read summary verbatim; captures abandoned approaches

Skip / noise filters

  • thinking content blocks — internal reasoning, not durable knowledge
  • Image content blocks — skip unless the user explicitly asks for image transcription
  • Raw tool outputs longer than 500 chars — summarize the outcome
  • Token accounting (usage fields) — metadata only
  • Repeated plan echoes or status updates

Critical privacy filter

Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.

  • Remove API keys, tokens, passwords, credentials
  • Redact private identifiers unless relevant and user-approved
  • Summarize bash outputs that contain paths, environment variables, or secrets
  • Do not quote raw toolCall arguments verbatim if they contain sensitive data

Step 3: Cluster by Topic

Do not create one wiki page per session.

  • Group knowledge by stable topic across many sessions
  • Split mixed sessions into separate themes
  • Merge recurring patterns across dates and projects
  • Use the cwd from the session header to infer project scope
  • Use session_info.name as a topic hint when available

Step 4: Distill into Wiki Pages

Route extracted knowledge using existing wiki conventions:

  • Project-specific architecture/process → projects/<name>/...
  • General concepts → concepts/
  • Recurring techniques/debug playbooks → skills/
  • Tools/services/frameworks → entities/
  • Cross-session patterns → synthesis/

For each impacted project, create/update projects/<name>/<name>.md.

Writing rules

  • Distill knowledge, not chronology
  • Avoid "on date X we discussed..." unless date context is essential
  • Add summary: frontmatter on each new/updated page (12 sentences, ≤ 200 chars)
  • Add confidence and lifecycle fields to every new page:
    base_confidence: 0.42
    lifecycle: draft
    lifecycle_changed: <ISO date today>
    
    Leave lifecycle unchanged on update.
  • Add provenance markers:
    • ^[extracted] when directly grounded in explicit session content (compaction/branch summaries, explicit assistant statements)
    • ^[inferred] when synthesizing patterns across multiple sessions or inferring from tool calls
    • ^[ambiguous] when sessions conflict or a compaction summary contradicts later turns
  • Add/update provenance: frontmatter mix for each changed page

Mark provenance per the convention in llm-wiki:

  • compaction and branch_summary entries are pre-distilled — treat as mostly ^[extracted]
  • Conversation distillation is mostly ^[inferred] — you're synthesizing from dialogue
  • Use ^[ambiguous] when the user changed their mind across sessions or when compaction summaries disagree with later conversation turns

Step 5: Update Manifest, Log, and Index

Update .manifest.json

For each processed source file:

  • ingested_at, size_bytes, modified_at
  • source_type: pi_session
  • project: inferred project name from decoded cwd
  • pages_created, pages_updated

Add/update a top-level summary block:

{
  "pi": {
    "source_path": "~/.pi/agent/sessions/",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "pages_created": 5,
    "pages_updated": 12
  }
}

Update special files

Update index.md and log.md:

- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

hot.md — Read $OBSIDIAN_VAULT_PATH/hot.md (create from the template in wiki-ingest if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Pi sessions across 3 projects; surfaced patterns in CLI tooling and API design." Keep the last 3 operations. Update updated timestamp.

Privacy and Compliance

  • Distill and synthesize; avoid raw transcript dumps
  • Default to redaction for anything that looks sensitive
  • Ask the user before storing personal or sensitive details
  • Keep references to other people minimal and purpose-bound

Reference

See references/pi-data-format.md for field-level parsing notes and extraction guidance.

QMD Refresh After Vault Writes

QMD is a search index, not the source of truth. If $QMD_WIKI_COLLECTION is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.

Use $QMD_CLI if set; otherwise use qmd.

${QMD_CLI:-qmd} update

If the output says vectors are needed or embeddings may be stale, run:

${QMD_CLI:-qmd} embed

Verify the collection with either:

${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"

or, when a specific page path is known:

${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5

Record one of:

  • QMD refreshed: update + embed + verified
  • QMD refreshed: update only + verified
  • QMD skipped: QMD_WIKI_COLLECTION unset
  • QMD skipped: qmd CLI unavailable
  • QMD failed: <short error summary>