add skills

2026-05-28 10:17:41 +09:00
parent c526552957
commit 51715816fd
81 changed files with 17935 additions and 2 deletions
@@ -0,0 +1,373 @@
+---
+name: copilot-history-ingest
+description: >
+  Ingest GitHub Copilot CLI session history into an Obsidian wiki as distilled knowledge pages. Use this skill
+  when the user wants to capture their Copilot CLI sessions into a personal wiki — extracting architecture
+  decisions, debug notes, and patterns into searchable Obsidian pages. Triggers on phrases like "ingest my
+  copilot sessions into obsidian", "add my copilot history to my wiki", "pull my copilot session history into
+  the vault", "capture what I've learned from copilot into obsidian", "just the new sessions since last time",
+  or "mine patterns across my copilot sessions". Also triggers when the user mentions session-store.db,
+  ~/.copilot/session-state, or VS Code copilot-chat transcripts in the context of building a wiki or knowledge
+  base. Does NOT trigger for general copilot usage questions, searching sessions, or backing up history.
+---
+
+# Copilot History Ingest — Conversation Mining
+
+You are extracting knowledge from the user's past GitHub Copilot CLI conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
+
+This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest copilot`).
+
+## Before You Start
+
+1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH`, `COPILOT_HISTORY_PATH` (defaults to `~/.copilot/session-state`), and `COPILOT_VSCODE_STORAGE_PATH` (VS Code `workspaceStorage`; platform-specific — ask the user if absent)
+2. Read `.manifest.json` at the vault root to check what's already been ingested
+3. Read `index.md` at the vault root to know what the wiki already contains
+
+## Ingest Modes
+
+### Append Mode (default)
+
+Check `.manifest.json` for each source file (events JSONL, transcript JSONL, checkpoint, session-store DB). Only process:
+
+- Sessions not in the manifest (new sessions)
+- Sessions whose `updated_at` is newer than their `ingested_at` in the manifest
+
+This is usually what you want — the user ran a few new sessions and wants to capture the delta.
+
+### Full Mode
+
+Process everything regardless of manifest. Use after a `wiki-rebuild` or if the user explicitly asks.
+
+## GitHub Copilot Data Layout
+
+Copilot stores data in three locations. Scan **all three**.
+
+### Source 1: `~/.copilot/session-state/` (CLI sessions)
+
+```
+~/.copilot/session-state/
+├── <session-uuid>/
+│   ├── workspace.yaml           # Session metadata (id, cwd, summary_count, created_at, updated_at)
+│   ├── vscode.metadata.json     # VS Code context (workspaceFolder, repositoryProperties, customTitle)
+│   ├── events.jsonl             # Full event log — all turns, tool calls, reasoning
+│   ├── session.db               # Per-session SQLite (todos/todo_deps only — skip for ingestion)
+│   ├── index.md                 # Session summary written at session end
+│   ├── checkpoints/             # Checkpoint JSON files (mid-session summaries)
+│   │   └── <uuid>.json          # title, overview, history, work_done, technical_details,
+│   │                            #   important_files, next_steps
+│   ├── files/                   # Artifacts produced during session (plans, diagrams, etc.)
+│   └── research/                # Research artifacts
+└── ...
+```
+
+### Source 2: `~/.copilot/session-store.db` (Global SQLite)
+
+The canonical cross-session database. This is the **highest-value** source: structured, queryable, and pre-summarised.
+
+```
+sessions       — id, cwd, repository, branch, summary, created_at, updated_at, host_type
+turns          — session_id, turn_index, user_message, assistant_response, timestamp
+checkpoints    — session_id, checkpoint_number, title, overview, history, work_done,
+                 technical_details, important_files, next_steps, created_at
+session_files  — session_id, file_path, tool_name, turn_index, first_seen_at
+session_refs   — session_id, ref_type (commit/pr/issue), ref_value, turn_index, created_at
+search_index   — FTS5 virtual table (content, session_id, source_type, source_id)
+```
+
+### Source 3: VS Code Workspace Storage (`<workspaceStorage>/<hash>/GitHub.copilot-chat/`)
+
+VS Code extension data, keyed by workspace hash. The path is platform-specific and must come from `.env` or user input.
+
+```
+<hash>/GitHub.copilot-chat/
+├── transcripts/
+│   └── <session-uuid>.jsonl     # Conversation transcripts (same JSONL format as events.jsonl)
+├── memory-tool/
+│   └── memories/
+│       └── <base64-session-id>/ # Per-session saved artifacts (plan.md, etc.)
+│           └── plan.md
+└── codebase-external.sqlite     # Codebase index (skip — no conversation knowledge)
+```
+
+### Key data sources ranked by value:
+
+1. **Checkpoints** (`session-store.db` `checkpoints` table + per-session `checkpoints/*.json`) — Pre-distilled summaries with `overview`, `work_done`, `technical_details`, `important_files`, `next_steps`. Gold.
+2. **Session summaries** (`session-store.db` `sessions.summary` + `index.md`) — One-paragraph synopsis per session.
+3. **Turns** (`session-store.db` `turns` table + `events.jsonl` / transcript JSONL) — Full conversation. Rich but verbose.
+4. **Memory artifacts** (`memory-tool/memories/<id>/plan.md` etc.) — Pre-written plans and structured notes the user saved explicitly. Worth importing verbatim (or lightly summarised).
+5. **File access patterns** (`session_files` table + `tool.execution_*` events) — Which files the agent repeatedly touched — reveals high-value project files.
+6. **Session refs** (`session_refs` table) — Commits, PRs, and issues linked to sessions.
+7. **`vscode.metadata.json`** — Workspace folder path, branch, `customTitle` (user-set session label). Useful for grouping and naming.
+
+## Step 1: Survey and Compute Delta
+
+Scan all three data locations and compare against `.manifest.json`:
+
+```bash
+# --- Source 1: per-session directories ---
+# Find all session directories (each has workspace.yaml)
+ls ~/.copilot/session-state/
+
+# For each session, read workspace.yaml for id/cwd/updated_at
+# and vscode.metadata.json for customTitle / repositoryProperties
+
+# --- Source 2: global database ---
+# Query session-store.db with sqlite3 (or Python sqlite3)
+SELECT s.id, s.cwd, s.repository, s.branch, s.summary, s.updated_at,
+       COUNT(DISTINCT t.turn_index) AS turn_count,
+       COUNT(DISTINCT c.id)         AS checkpoint_count
+FROM sessions s
+LEFT JOIN turns t ON t.session_id = s.id
+LEFT JOIN checkpoints c ON c.session_id = s.id
+GROUP BY s.id
+ORDER BY s.updated_at DESC;
+
+# --- Source 3: VS Code workspace storage ---
+# For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/
+# Find transcript files
+ls <workspaceStorage>/<hash>/GitHub.copilot-chat/transcripts/
+```
+
+Build a unified inventory — one entry per session UUID — and classify:
+
+- **New** — not in manifest → needs ingesting
+- **Modified** — in manifest but `updated_at` is newer → needs re-ingesting
+- **Unchanged** — in manifest and not modified → skip in append mode
+
+Report to the user: "Found X sessions in session-state, Y in session-store.db, Z VS Code transcript files. Checkpoints: A. Delta: B new, C modified."
+
+## Step 2: Ingest Checkpoints and Summaries First
+
+Checkpoints are already distilled — process them before touching raw turns.
+
+### From `session-store.db`:
+
+```sql
+SELECT s.id, s.cwd, s.repository, s.branch, s.summary,
+       c.checkpoint_number, c.title, c.overview, c.work_done,
+       c.technical_details, c.important_files, c.next_steps,
+       c.created_at
+FROM checkpoints c
+JOIN sessions s ON c.session_id = s.id
+ORDER BY s.updated_at DESC, c.checkpoint_number ASC;
+```
+
+### From per-session `checkpoints/*.json`:
+
+Each checkpoint file has: `title`, `overview`, `history`, `work_done`, `technical_details`, `important_files`, `next_steps`.
+
+Read `index.md` (if present) as a session-level summary — it's typically written at session end and is already concise.
+
+### What to extract:
+
+- `overview` → high-level description of what the session accomplished
+- `work_done` → concrete tasks completed (good for skills / project pages)
+- `technical_details` → implementation specifics (good for concepts pages)
+- `important_files` → high-value files in the project (good for project pages)
+- `next_steps` → open threads (good for linking to ongoing project work)
+
+## Step 3: Parse Session Turns
+
+Read turns from `session-store.db` (preferred — already parsed) or from `events.jsonl` / transcript JSONL.
+
+### From `session-store.db`:
+
+```sql
+SELECT turn_index, user_message, assistant_response, timestamp
+FROM turns
+WHERE session_id = '<uuid>'
+ORDER BY turn_index ASC;
+```
+
+### From `events.jsonl` / transcript JSONL:
+
+Each file is one session. Each line is a JSON event. See `references/copilot-data-format.md` for the full schema.
+
+**Relevant event types:**
+
+| `type`                | What it is                              | Worth reading?                            |
+| --------------------- | --------------------------------------- | ----------------------------------------- |
+| `session.start`       | Session metadata (cwd, branch, version) | Yes — establishes project context         |
+| `user.message`        | User turn                               | Yes — `data.content`                      |
+| `assistant.message`   | Assistant turn                          | Yes — `data.content` (text) + `data.toolRequests` |
+| `tool.execution_start`| Tool call                               | Skim — reveals what files/commands were used |
+| `tool.execution_end`  | Tool result                             | No — usually noise                        |
+
+**Extraction strategy for `assistant.message`:**
+
+- `data.content` is the assistant's text response — extract this
+- `data.reasoningText` is internal reasoning — skip (it's the unpacked `reasoningOpaque` field)
+- `data.toolRequests` lists tool calls — skim tool names and arguments for file access patterns
+- Skip `type: "tool.execution_end"` entirely
+
+## Step 3b: Process Memory Artifacts
+
+For each session that has a `memory-tool/memories/<base64-id>/` directory in VS Code workspace storage, read any markdown files saved there (typically `plan.md`). These are documents the user explicitly saved — treat them as high-quality, user-authored content.
+
+Decode the base64 directory name to get the session UUID:
+
+```python
+import base64
+session_id = base64.b64decode(dir_name).decode('utf-8')
+```
+
+Memory artifacts map to project `skills/` or `concepts/` pages, depending on content type.
+
+## Step 3c: Extract File and Ref Patterns
+
+From `session-store.db`:
+
+```sql
+-- Most-touched files per project
+SELECT repository, file_path, COUNT(*) AS touch_count
+FROM session_files
+GROUP BY repository, file_path
+ORDER BY touch_count DESC;
+
+-- Linked commits/PRs/issues per session
+SELECT session_id, ref_type, ref_value, turn_index
+FROM session_refs
+ORDER BY session_id, turn_index;
+```
+
+**File access patterns** reveal which files are architecturally important — note them on project pages.
+
+**Session refs** link Copilot sessions to git history — useful for connecting wiki knowledge to concrete code changes.
+
+## Step 4: Cluster by Topic
+
+Don't create one wiki page per session. Instead:
+
+- Group extracted knowledge **by topic** across sessions
+- A single session about "debugging auth + setting up CI" → two separate topics
+- Three sessions across different days about "React performance" → one merged topic
+- `cwd` / `repository` give you a natural first-level grouping; `vscode.metadata.json`'s `customTitle` gives a human-readable session label
+
+## Step 5: Distill into Wiki Pages
+
+Each Copilot project maps to a project directory in the vault. Derive the project name from `cwd` or `repository`:
+
+```
+C:\Users\name\git\my-project   → my-project
+/Users/name/code/another-app   → another-app
+```
+
+Prefer `repository` (e.g., `owner/repo`) from `session-store.db` over raw `cwd` when available.
+
+### Project-specific vs. global knowledge
+
+| What you found                      | Where it goes               | Example                                              |
+| ----------------------------------- | --------------------------- | ---------------------------------------------------- |
+| Project architecture decisions      | `projects/<name>/concepts/` | `projects/my-project/concepts/main-architecture.md`  |
+| Project-specific debugging patterns | `projects/<name>/skills/`   | `projects/my-project/skills/api-rate-limiting.md`    |
+| General concept the user learned    | `concepts/` (global)        | `concepts/react-server-components.md`                |
+| Recurring problem across projects   | `skills/` (global)          | `skills/debugging-hydration-errors.md`               |
+| A tool/service used                 | `entities/` (global)        | `entities/vercel-functions.md`                       |
+| Patterns across many sessions       | `synthesis/` (global)       | `synthesis/common-debugging-patterns.md`             |
+
+For each project with content, create or update the project overview page at `projects/<name>/<name>.md` — **named after the project, not `_project.md`**. Obsidian's graph view uses the filename as the node label, so `_project.md` makes every project show up as `_project` in the graph. Naming it `<name>.md` gives each project a distinct, readable node name.
+
+**Important:** Distill the _knowledge_, not the conversation. Don't write "In a session on March 15, the user asked about X." Write the knowledge itself, with the session as a source attribution.
+
+**Write a `summary:` frontmatter field** on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it. `wiki-query`'s cheap retrieval path reads this field to avoid opening page bodies.
+
+**Add confidence and lifecycle fields** to every new page's frontmatter:
+```yaml
+base_confidence: 0.42
+lifecycle: draft
+lifecycle_changed: <ISO date today>
+```
+Leave `lifecycle` unchanged on update.
+
+**Mark provenance** per the convention in `llm-wiki` (Provenance Markers section):
+
+- **Checkpoints and index.md** are pre-distilled by the system — treat checkpoint-derived claims as extracted (the system wrote them from observed actions).
+- **Memory artifacts** are user-authored — treat as extracted.
+- **Conversation turn distillation** is mostly inferred. You're synthesizing a coherent claim from many turns. Apply `^[inferred]` liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
+- Use `^[ambiguous]` when the user changed direction mid-session or when the session ended unresolved.
+- Write a `provenance:` frontmatter block on every new/updated page summarizing the rough mix.
+
+## Step 6: Update Manifest, Journal, and Special Files
+
+### Update `.manifest.json`
+
+For each session processed, add/update its entry with:
+
+- `ingested_at`, `session_id`, `updated_at`
+- `source_type`: one of `"copilot_session"`, `"copilot_checkpoint"`, `"copilot_transcript"`, `"copilot_memory_artifact"`
+- `project`: the decoded project name
+- `pages_created` and `pages_updated` lists
+
+Also update the `projects` section of the manifest:
+
+```json
+{
+  "project-name": {
+    "repository": "owner/repo",
+    "cwd": "C:\\Users\\name\\git\\project-name",
+    "vault_path": "projects/project-name",
+    "last_ingested": "TIMESTAMP",
+    "sessions_ingested": 5,
+    "sessions_total": 8,
+    "checkpoints_ingested": 12,
+    "memory_artifacts_ingested": 3
+  }
+}
+```
+
+### Create journal entry + update special files
+
+Update `index.md` and `log.md` per the standard process:
+
+```
+- [TIMESTAMP] COPILOT_HISTORY_INGEST projects=N sessions=M checkpoints=C pages_updated=X pages_created=Y mode=append|full
+```
+
+**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested 5 Copilot sessions across 2 projects; surfaced patterns in API design and testing strategy." Keep the last 3 operations. Update **Active Threads** if any ongoing project is now better understood. Update `updated` timestamp.
+
+## Privacy
+
+- Distill and synthesize — don't copy raw conversation text verbatim
+- Skip anything that looks like secrets, API keys, passwords, tokens
+- `data.reasoningOpaque` / `data.reasoningText` in assistant events is internal reasoning — skip entirely, never copy to wiki
+- If you encounter personal/sensitive content, ask the user before including it
+- The user's conversations may reference other people — be thoughtful about what goes in the wiki
+
+## Reference
+
+See `references/copilot-data-format.md` for detailed data structure documentation.
+
+## QMD Refresh After Vault Writes
+
+QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
+
+Use `$QMD_CLI` if set; otherwise use `qmd`.
+
+```bash
+${QMD_CLI:-qmd} update
+```
+
+If the output says vectors are needed or embeddings may be stale, run:
+
+```bash
+${QMD_CLI:-qmd} embed
+```
+
+Verify the collection with either:
+
+```bash
+${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
+```
+
+or, when a specific page path is known:
+
+```bash
+${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
+```
+
+Record one of:
+- `QMD refreshed: update + embed + verified`
+- `QMD refreshed: update only + verified`
+- `QMD skipped: QMD_WIKI_COLLECTION unset`
+- `QMD skipped: qmd CLI unavailable`
+- `QMD failed: <short error summary>`
@@ -0,0 +1,321 @@
+# GitHub Copilot CLI Data Format — Detailed Reference
+
+## Session-State Directory
+
+`~/.copilot/session-state/` contains one directory per session the user has run with GitHub Copilot CLI. Each directory is named with a UUID.
+
+### `workspace.yaml`
+
+Minimal session metadata file, always present:
+
+```yaml
+id: <session-uuid>
+cwd: /path/to/project
+summary_count: 3
+created_at: 2026-04-02T14:28:13.304Z
+updated_at: 2026-04-29T12:00:00.000Z
+```
+
+`summary_count` reflects how many checkpoints were written. Sessions with `summary_count: 0` were either very short or completed without checkpointing — check `events.jsonl` for content anyway.
+
+### `vscode.metadata.json`
+
+VS Code context, written when the session is associated with a VS Code workspace:
+
+```json
+{
+  "workspaceFolder": {
+    "folderPath": "c:\\Users\\name\\git\\my-project",
+    "timestamp": 1773245818098
+  },
+  "writtenToDisc": true,
+  "repositoryProperties": {
+    "repositoryPath": "c:\\Users\\name\\git\\my-project",
+    "branchName": "feature/my-branch",
+    "baseBranchName": "origin/main"
+  },
+  "customTitle": "User-written session label or system-set title"
+}
+```
+
+`customTitle` is the most human-readable session label — use it as a heading when creating session-derived wiki content. May be absent on older sessions.
+
+### `events.jsonl`
+
+The full event log for one session. Each line is a JSON object representing one event in the session.
+
+#### Event: `session.start`
+
+```json
+{
+  "type": "session.start",
+  "data": {
+    "sessionId": "09371a50-9a50-484a-8743-5c696de1623a",
+    "version": 1,
+    "producer": "copilot-agent",
+    "copilotVersion": "0.0.420",
+    "startTime": "2026-03-02T15:10:04.678Z",
+    "context": {
+      "cwd": "C:\\Users\\name\\git\\my-project",
+      "gitRoot": "C:\\Users\\name\\git\\my-project",
+      "branch": "master"
+    }
+  },
+  "id": "<event-uuid>",
+  "timestamp": "2026-03-02T15:10:04.817Z",
+  "parentId": null
+}
+```
+
+`data.context.cwd` and `data.context.branch` establish the project context. Always read `session.start` first.
+
+#### Event: `user.message`
+
+```json
+{
+  "type": "user.message",
+  "data": {
+    "content": "review my staged but uncommitted changes for issues",
+    "transformedContent": "<current_datetime>...</current_datetime>\n\nreview my staged...",
+    "attachments": [],
+    "interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94"
+  },
+  "id": "<event-uuid>",
+  "timestamp": "2026-03-02T15:10:45.058Z",
+  "parentId": "<parent-event-uuid>"
+}
+```
+
+Use `data.content` (not `data.transformedContent`) — the transformed version includes injected system context that's noise for wiki purposes.
+
+#### Event: `assistant.message`
+
+```json
+{
+  "type": "assistant.message",
+  "data": {
+    "messageId": "<uuid>",
+    "content": "I'll review the staged changes in those three files.",
+    "toolRequests": [
+      {
+        "toolCallId": "tooluse_...",
+        "name": "report_intent",
+        "arguments": { "intent": "Reviewing staged changes" },
+        "type": "function"
+      },
+      {
+        "toolCallId": "tooluse_...",
+        "name": "powershell",
+        "arguments": {
+          "command": "git --no-pager diff --cached --stat",
+          "description": "Show staged diff"
+        },
+        "type": "function"
+      }
+    ],
+    "interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94",
+    "reasoningOpaque": "<base64-encrypted-reasoning>",
+    "reasoningText": "The user wants me to review staged git changes..."
+  },
+  "id": "<event-uuid>",
+  "timestamp": "2026-03-02T15:10:50.235Z",
+  "parentId": "<parent-event-uuid>"
+}
+```
+
+**Extraction strategy:**
+
+- Extract `data.content` — the assistant's visible text response
+- `data.toolRequests` — skim tool names and description arguments for file/command patterns; ignore `report_intent` calls
+- **Skip `data.reasoningOpaque` entirely** — encrypted/encoded internal reasoning
+- **Skip `data.reasoningText` entirely** — decrypted reasoning; internal only, never user-visible
+
+#### Event: `assistant.turn_start`
+
+```json
+{
+  "type": "assistant.turn_start",
+  "data": { "turnId": "0", "interactionId": "..." },
+  "id": "...",
+  "timestamp": "..."
+}
+```
+
+Marks the start of an assistant turn. Useful for turn boundary detection; no content to extract.
+
+#### Event: `tool.execution_start`
+
+```json
+{
+  "type": "tool.execution_start",
+  "data": {
+    "toolCallId": "tooluse_...",
+    "toolName": "powershell",
+    "arguments": { "command": "dotnet build ...", "description": "Build project" }
+  },
+  "id": "...",
+  "timestamp": "..."
+}
+```
+
+Reveals what tools (file reads, commands, searches) were invoked. File-related tools (`view`, `edit`, `create`) with their paths are worth noting for the `session_files` equivalent when reading events directly.
+
+#### Event: `tool.execution_end`
+
+Contains the raw tool output. Usually noise — skip unless diagnosing errors.
+
+### `checkpoints/<uuid>.json`
+
+Mid-session progress summaries, written automatically as the session progresses:
+
+```json
+{
+  "title": "Implementing auth module",
+  "overview": "Working on JWT authentication for the API...",
+  "history": "1. Analyzed existing auth code\n2. Created IAuthService...",
+  "work_done": "- Created IAuthService interface\n- Implemented JwtAuthService",
+  "technical_details": "Uses RS256 signing. Token expiry configurable via settings...",
+  "important_files": "- src/Auth/IAuthService.cs\n- src/Auth/JwtAuthService.cs",
+  "next_steps": "- Wire up to DI container\n- Add refresh token support"
+}
+```
+
+This is the highest-value structured content in the per-session directory — equivalent to Claude's memory files.
+
+### `index.md`
+
+Session-end summary written as a markdown file. Typically 1–3 paragraphs summarizing what was accomplished. Content varies by session length and complexity. Read this before opening `events.jsonl` to decide if the session is worth deep-processing.
+
+---
+
+## Global Session Store (`session-store.db`)
+
+SQLite database at `~/.copilot/session-store.db`. The canonical cross-session record.
+
+### Schema
+
+#### `sessions`
+
+| Column      | Type | Notes                                      |
+| ----------- | ---- | ------------------------------------------ |
+| `id`        | TEXT | Session UUID (PK)                          |
+| `cwd`       | TEXT | Working directory                          |
+| `repository`| TEXT | `owner/repo` format when available         |
+| `branch`    | TEXT | Git branch name                            |
+| `summary`   | TEXT | One-paragraph session summary              |
+| `created_at`| TEXT | ISO 8601 timestamp                         |
+| `updated_at`| TEXT | ISO 8601 timestamp — use for delta checks  |
+| `host_type` | TEXT | `"vscode"`, `"cli"`, or similar            |
+
+#### `turns`
+
+| Column              | Type    | Notes                          |
+| ------------------- | ------- | ------------------------------ |
+| `id`                | INTEGER | PK                             |
+| `session_id`        | TEXT    | FK → `sessions.id`             |
+| `turn_index`        | INTEGER | 0-based turn sequence          |
+| `user_message`      | TEXT    | Raw user message               |
+| `assistant_response`| TEXT    | Assistant's text response      |
+| `timestamp`         | TEXT    | ISO 8601 timestamp             |
+
+Note: `user_message` here is the pre-transformation content — use this, not `transformedContent` from `events.jsonl`.
+
+#### `checkpoints`
+
+| Column              | Type    | Notes                          |
+| ------------------- | ------- | ------------------------------ |
+| `id`                | INTEGER | PK                             |
+| `session_id`        | TEXT    | FK → `sessions.id`             |
+| `checkpoint_number` | INTEGER | 1-based                        |
+| `title`             | TEXT    | Short title                    |
+| `overview`          | TEXT    | High-level summary             |
+| `history`           | TEXT    | Step-by-step of what happened  |
+| `work_done`         | TEXT    | Completed items                |
+| `technical_details` | TEXT    | Implementation specifics       |
+| `important_files`   | TEXT    | Key files touched              |
+| `next_steps`        | TEXT    | Open threads                   |
+| `created_at`        | TEXT    | ISO 8601 timestamp             |
+
+#### `session_files`
+
+| Column         | Type    | Notes                                              |
+| -------------- | ------- | -------------------------------------------------- |
+| `session_id`   | TEXT    | FK → `sessions.id`                                 |
+| `file_path`    | TEXT    | Absolute path to the file                          |
+| `tool_name`    | TEXT    | `"edit"`, `"create"`, `"view"`, etc.               |
+| `turn_index`   | INTEGER | Which turn touched the file                        |
+| `first_seen_at`| TEXT    | ISO 8601 timestamp                                 |
+
+> ⚠️ No `id` column — use `COUNT(DISTINCT sf.file_path)` not `COUNT(DISTINCT sf.id)`.
+
+Aggregate by `file_path` across sessions to identify architecturally important files.
+
+#### `session_refs`
+
+| Column       | Type    | Notes                                      |
+| ------------ | ------- | ------------------------------------------ |
+| `id`         | INTEGER | PK                                         |
+| `session_id` | TEXT    | FK → `sessions.id`                         |
+| `ref_type`   | TEXT    | `"commit"`, `"pr"`, `"issue"`              |
+| `ref_value`  | TEXT    | Commit SHA, PR number, issue number        |
+| `turn_index` | INTEGER | Which turn referenced it                   |
+| `created_at` | TEXT    | ISO 8601 timestamp                         |
+
+#### `search_index` (FTS5)
+
+Full-text search index. Use for keyword discovery when surveying a large history:
+
+```sql
+SELECT content, session_id, source_type
+FROM search_index
+WHERE search_index MATCH 'auth OR authentication OR login'
+LIMIT 20;
+```
+
+`source_type` values: `"turn"`, `"checkpoint_overview"`, `"checkpoint_history"`, `"checkpoint_work_done"`, `"checkpoint_technical"`, `"checkpoint_files"`, `"checkpoint_next_steps"`, `"workspace_artifact"`.
+
+---
+
+## VS Code Workspace Storage
+
+### Location
+
+The `workspaceStorage` directory is platform-specific:
+
+| Platform | Default path                                                       |
+| -------- | ------------------------------------------------------------------ |
+| Windows  | `%APPDATA%\Code\User\workspaceStorage\`                            |
+| macOS    | `~/Library/Application Support/Code/User/workspaceStorage/`       |
+| Linux    | `~/.config/Code/User/workspaceStorage/`                           |
+
+Each `<hash>/` subdirectory corresponds to a specific workspace (VS Code folder). The hash is derived from the workspace path — there is no human-readable mapping, so enumerate all `<hash>/GitHub.copilot-chat/` directories and use the `transcripts/` JSONL files' `session.start` events to identify which project each belongs to.
+
+### Transcript JSONL (`transcripts/<uuid>.jsonl`)
+
+Identical format to `events.jsonl` from Source 1. Parse using the same event type handlers. The `session.start` event's `data.context.cwd` tells you which project this belongs to.
+
+### Memory Artifacts (`memory-tool/memories/<base64-session-id>/`)
+
+Directory name is the session UUID encoded as base64. Files inside are markdown documents explicitly saved by the user or system during the session — typically `plan.md` containing the session plan.
+
+Decode the directory name to link it to a session:
+
+```python
+import base64
+# Pad to multiple of 4 before decoding
+session_id = base64.b64decode(dir_name + '==').decode('utf-8')
+```
+
+---
+
+## Processing Order
+
+For maximum efficiency and signal-to-noise:
+
+1. **`session-store.db` checkpoints** — Fastest, highest signal. Query all at once.
+2. **`session-store.db` sessions.summary** — One-paragraph synopsis per session.
+3. **Per-session `checkpoints/*.json` + `index.md`** — For sessions not yet in `session-store.db` or for additional detail.
+4. **Memory artifacts** (`memory-tool/memories/`) — User-authored, high quality.
+5. **`session-store.db` turns** — Full conversation, process selectively by topic.
+6. **`events.jsonl` / transcript JSONL** — Only if `session-store.db` is absent or incomplete.
+7. **`session_files` / `session_refs`** — For file pattern and git linkage metadata.