add skills

This commit is contained in:
김경종
2026-05-28 10:17:41 +09:00
parent c526552957
commit 51715816fd
81 changed files with 17935 additions and 2 deletions
@@ -0,0 +1,373 @@
---
name: copilot-history-ingest
description: >
Ingest GitHub Copilot CLI session history into an Obsidian wiki as distilled knowledge pages. Use this skill
when the user wants to capture their Copilot CLI sessions into a personal wiki — extracting architecture
decisions, debug notes, and patterns into searchable Obsidian pages. Triggers on phrases like "ingest my
copilot sessions into obsidian", "add my copilot history to my wiki", "pull my copilot session history into
the vault", "capture what I've learned from copilot into obsidian", "just the new sessions since last time",
or "mine patterns across my copilot sessions". Also triggers when the user mentions session-store.db,
~/.copilot/session-state, or VS Code copilot-chat transcripts in the context of building a wiki or knowledge
base. Does NOT trigger for general copilot usage questions, searching sessions, or backing up history.
---
# Copilot History Ingest — Conversation Mining
You are extracting knowledge from the user's past GitHub Copilot CLI conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest copilot`).
## Before You Start
1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env``~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH`, `COPILOT_HISTORY_PATH` (defaults to `~/.copilot/session-state`), and `COPILOT_VSCODE_STORAGE_PATH` (VS Code `workspaceStorage`; platform-specific — ask the user if absent)
2. Read `.manifest.json` at the vault root to check what's already been ingested
3. Read `index.md` at the vault root to know what the wiki already contains
## Ingest Modes
### Append Mode (default)
Check `.manifest.json` for each source file (events JSONL, transcript JSONL, checkpoint, session-store DB). Only process:
- Sessions not in the manifest (new sessions)
- Sessions whose `updated_at` is newer than their `ingested_at` in the manifest
This is usually what you want — the user ran a few new sessions and wants to capture the delta.
### Full Mode
Process everything regardless of manifest. Use after a `wiki-rebuild` or if the user explicitly asks.
## GitHub Copilot Data Layout
Copilot stores data in three locations. Scan **all three**.
### Source 1: `~/.copilot/session-state/` (CLI sessions)
```
~/.copilot/session-state/
├── <session-uuid>/
│ ├── workspace.yaml # Session metadata (id, cwd, summary_count, created_at, updated_at)
│ ├── vscode.metadata.json # VS Code context (workspaceFolder, repositoryProperties, customTitle)
│ ├── events.jsonl # Full event log — all turns, tool calls, reasoning
│ ├── session.db # Per-session SQLite (todos/todo_deps only — skip for ingestion)
│ ├── index.md # Session summary written at session end
│ ├── checkpoints/ # Checkpoint JSON files (mid-session summaries)
│ │ └── <uuid>.json # title, overview, history, work_done, technical_details,
│ │ # important_files, next_steps
│ ├── files/ # Artifacts produced during session (plans, diagrams, etc.)
│ └── research/ # Research artifacts
└── ...
```
### Source 2: `~/.copilot/session-store.db` (Global SQLite)
The canonical cross-session database. This is the **highest-value** source: structured, queryable, and pre-summarised.
```
sessions — id, cwd, repository, branch, summary, created_at, updated_at, host_type
turns — session_id, turn_index, user_message, assistant_response, timestamp
checkpoints — session_id, checkpoint_number, title, overview, history, work_done,
technical_details, important_files, next_steps, created_at
session_files — session_id, file_path, tool_name, turn_index, first_seen_at
session_refs — session_id, ref_type (commit/pr/issue), ref_value, turn_index, created_at
search_index — FTS5 virtual table (content, session_id, source_type, source_id)
```
### Source 3: VS Code Workspace Storage (`<workspaceStorage>/<hash>/GitHub.copilot-chat/`)
VS Code extension data, keyed by workspace hash. The path is platform-specific and must come from `.env` or user input.
```
<hash>/GitHub.copilot-chat/
├── transcripts/
│ └── <session-uuid>.jsonl # Conversation transcripts (same JSONL format as events.jsonl)
├── memory-tool/
│ └── memories/
│ └── <base64-session-id>/ # Per-session saved artifacts (plan.md, etc.)
│ └── plan.md
└── codebase-external.sqlite # Codebase index (skip — no conversation knowledge)
```
### Key data sources ranked by value:
1. **Checkpoints** (`session-store.db` `checkpoints` table + per-session `checkpoints/*.json`) — Pre-distilled summaries with `overview`, `work_done`, `technical_details`, `important_files`, `next_steps`. Gold.
2. **Session summaries** (`session-store.db` `sessions.summary` + `index.md`) — One-paragraph synopsis per session.
3. **Turns** (`session-store.db` `turns` table + `events.jsonl` / transcript JSONL) — Full conversation. Rich but verbose.
4. **Memory artifacts** (`memory-tool/memories/<id>/plan.md` etc.) — Pre-written plans and structured notes the user saved explicitly. Worth importing verbatim (or lightly summarised).
5. **File access patterns** (`session_files` table + `tool.execution_*` events) — Which files the agent repeatedly touched — reveals high-value project files.
6. **Session refs** (`session_refs` table) — Commits, PRs, and issues linked to sessions.
7. **`vscode.metadata.json`** — Workspace folder path, branch, `customTitle` (user-set session label). Useful for grouping and naming.
## Step 1: Survey and Compute Delta
Scan all three data locations and compare against `.manifest.json`:
```bash
# --- Source 1: per-session directories ---
# Find all session directories (each has workspace.yaml)
ls ~/.copilot/session-state/
# For each session, read workspace.yaml for id/cwd/updated_at
# and vscode.metadata.json for customTitle / repositoryProperties
# --- Source 2: global database ---
# Query session-store.db with sqlite3 (or Python sqlite3)
SELECT s.id, s.cwd, s.repository, s.branch, s.summary, s.updated_at,
COUNT(DISTINCT t.turn_index) AS turn_count,
COUNT(DISTINCT c.id) AS checkpoint_count
FROM sessions s
LEFT JOIN turns t ON t.session_id = s.id
LEFT JOIN checkpoints c ON c.session_id = s.id
GROUP BY s.id
ORDER BY s.updated_at DESC;
# --- Source 3: VS Code workspace storage ---
# For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/
# Find transcript files
ls <workspaceStorage>/<hash>/GitHub.copilot-chat/transcripts/
```
Build a unified inventory — one entry per session UUID — and classify:
- **New** — not in manifest → needs ingesting
- **Modified** — in manifest but `updated_at` is newer → needs re-ingesting
- **Unchanged** — in manifest and not modified → skip in append mode
Report to the user: "Found X sessions in session-state, Y in session-store.db, Z VS Code transcript files. Checkpoints: A. Delta: B new, C modified."
## Step 2: Ingest Checkpoints and Summaries First
Checkpoints are already distilled — process them before touching raw turns.
### From `session-store.db`:
```sql
SELECT s.id, s.cwd, s.repository, s.branch, s.summary,
c.checkpoint_number, c.title, c.overview, c.work_done,
c.technical_details, c.important_files, c.next_steps,
c.created_at
FROM checkpoints c
JOIN sessions s ON c.session_id = s.id
ORDER BY s.updated_at DESC, c.checkpoint_number ASC;
```
### From per-session `checkpoints/*.json`:
Each checkpoint file has: `title`, `overview`, `history`, `work_done`, `technical_details`, `important_files`, `next_steps`.
Read `index.md` (if present) as a session-level summary — it's typically written at session end and is already concise.
### What to extract:
- `overview` → high-level description of what the session accomplished
- `work_done` → concrete tasks completed (good for skills / project pages)
- `technical_details` → implementation specifics (good for concepts pages)
- `important_files` → high-value files in the project (good for project pages)
- `next_steps` → open threads (good for linking to ongoing project work)
## Step 3: Parse Session Turns
Read turns from `session-store.db` (preferred — already parsed) or from `events.jsonl` / transcript JSONL.
### From `session-store.db`:
```sql
SELECT turn_index, user_message, assistant_response, timestamp
FROM turns
WHERE session_id = '<uuid>'
ORDER BY turn_index ASC;
```
### From `events.jsonl` / transcript JSONL:
Each file is one session. Each line is a JSON event. See `references/copilot-data-format.md` for the full schema.
**Relevant event types:**
| `type` | What it is | Worth reading? |
| --------------------- | --------------------------------------- | ----------------------------------------- |
| `session.start` | Session metadata (cwd, branch, version) | Yes — establishes project context |
| `user.message` | User turn | Yes — `data.content` |
| `assistant.message` | Assistant turn | Yes — `data.content` (text) + `data.toolRequests` |
| `tool.execution_start`| Tool call | Skim — reveals what files/commands were used |
| `tool.execution_end` | Tool result | No — usually noise |
**Extraction strategy for `assistant.message`:**
- `data.content` is the assistant's text response — extract this
- `data.reasoningText` is internal reasoning — skip (it's the unpacked `reasoningOpaque` field)
- `data.toolRequests` lists tool calls — skim tool names and arguments for file access patterns
- Skip `type: "tool.execution_end"` entirely
## Step 3b: Process Memory Artifacts
For each session that has a `memory-tool/memories/<base64-id>/` directory in VS Code workspace storage, read any markdown files saved there (typically `plan.md`). These are documents the user explicitly saved — treat them as high-quality, user-authored content.
Decode the base64 directory name to get the session UUID:
```python
import base64
session_id = base64.b64decode(dir_name).decode('utf-8')
```
Memory artifacts map to project `skills/` or `concepts/` pages, depending on content type.
## Step 3c: Extract File and Ref Patterns
From `session-store.db`:
```sql
-- Most-touched files per project
SELECT repository, file_path, COUNT(*) AS touch_count
FROM session_files
GROUP BY repository, file_path
ORDER BY touch_count DESC;
-- Linked commits/PRs/issues per session
SELECT session_id, ref_type, ref_value, turn_index
FROM session_refs
ORDER BY session_id, turn_index;
```
**File access patterns** reveal which files are architecturally important — note them on project pages.
**Session refs** link Copilot sessions to git history — useful for connecting wiki knowledge to concrete code changes.
## Step 4: Cluster by Topic
Don't create one wiki page per session. Instead:
- Group extracted knowledge **by topic** across sessions
- A single session about "debugging auth + setting up CI" → two separate topics
- Three sessions across different days about "React performance" → one merged topic
- `cwd` / `repository` give you a natural first-level grouping; `vscode.metadata.json`'s `customTitle` gives a human-readable session label
## Step 5: Distill into Wiki Pages
Each Copilot project maps to a project directory in the vault. Derive the project name from `cwd` or `repository`:
```
C:\Users\name\git\my-project → my-project
/Users/name/code/another-app → another-app
```
Prefer `repository` (e.g., `owner/repo`) from `session-store.db` over raw `cwd` when available.
### Project-specific vs. global knowledge
| What you found | Where it goes | Example |
| ----------------------------------- | --------------------------- | ---------------------------------------------------- |
| Project architecture decisions | `projects/<name>/concepts/` | `projects/my-project/concepts/main-architecture.md` |
| Project-specific debugging patterns | `projects/<name>/skills/` | `projects/my-project/skills/api-rate-limiting.md` |
| General concept the user learned | `concepts/` (global) | `concepts/react-server-components.md` |
| Recurring problem across projects | `skills/` (global) | `skills/debugging-hydration-errors.md` |
| A tool/service used | `entities/` (global) | `entities/vercel-functions.md` |
| Patterns across many sessions | `synthesis/` (global) | `synthesis/common-debugging-patterns.md` |
For each project with content, create or update the project overview page at `projects/<name>/<name>.md`**named after the project, not `_project.md`**. Obsidian's graph view uses the filename as the node label, so `_project.md` makes every project show up as `_project` in the graph. Naming it `<name>.md` gives each project a distinct, readable node name.
**Important:** Distill the _knowledge_, not the conversation. Don't write "In a session on March 15, the user asked about X." Write the knowledge itself, with the session as a source attribution.
**Write a `summary:` frontmatter field** on every new/updated page — 12 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it. `wiki-query`'s cheap retrieval path reads this field to avoid opening page bodies.
**Add confidence and lifecycle fields** to every new page's frontmatter:
```yaml
base_confidence: 0.42
lifecycle: draft
lifecycle_changed: <ISO date today>
```
Leave `lifecycle` unchanged on update.
**Mark provenance** per the convention in `llm-wiki` (Provenance Markers section):
- **Checkpoints and index.md** are pre-distilled by the system — treat checkpoint-derived claims as extracted (the system wrote them from observed actions).
- **Memory artifacts** are user-authored — treat as extracted.
- **Conversation turn distillation** is mostly inferred. You're synthesizing a coherent claim from many turns. Apply `^[inferred]` liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
- Use `^[ambiguous]` when the user changed direction mid-session or when the session ended unresolved.
- Write a `provenance:` frontmatter block on every new/updated page summarizing the rough mix.
## Step 6: Update Manifest, Journal, and Special Files
### Update `.manifest.json`
For each session processed, add/update its entry with:
- `ingested_at`, `session_id`, `updated_at`
- `source_type`: one of `"copilot_session"`, `"copilot_checkpoint"`, `"copilot_transcript"`, `"copilot_memory_artifact"`
- `project`: the decoded project name
- `pages_created` and `pages_updated` lists
Also update the `projects` section of the manifest:
```json
{
"project-name": {
"repository": "owner/repo",
"cwd": "C:\\Users\\name\\git\\project-name",
"vault_path": "projects/project-name",
"last_ingested": "TIMESTAMP",
"sessions_ingested": 5,
"sessions_total": 8,
"checkpoints_ingested": 12,
"memory_artifacts_ingested": 3
}
}
```
### Create journal entry + update special files
Update `index.md` and `log.md` per the standard process:
```
- [TIMESTAMP] COPILOT_HISTORY_INGEST projects=N sessions=M checkpoints=C pages_updated=X pages_created=Y mode=append|full
```
**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested 5 Copilot sessions across 2 projects; surfaced patterns in API design and testing strategy." Keep the last 3 operations. Update **Active Threads** if any ongoing project is now better understood. Update `updated` timestamp.
## Privacy
- Distill and synthesize — don't copy raw conversation text verbatim
- Skip anything that looks like secrets, API keys, passwords, tokens
- `data.reasoningOpaque` / `data.reasoningText` in assistant events is internal reasoning — skip entirely, never copy to wiki
- If you encounter personal/sensitive content, ask the user before including it
- The user's conversations may reference other people — be thoughtful about what goes in the wiki
## Reference
See `references/copilot-data-format.md` for detailed data structure documentation.
## QMD Refresh After Vault Writes
QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
Use `$QMD_CLI` if set; otherwise use `qmd`.
```bash
${QMD_CLI:-qmd} update
```
If the output says vectors are needed or embeddings may be stale, run:
```bash
${QMD_CLI:-qmd} embed
```
Verify the collection with either:
```bash
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
```
or, when a specific page path is known:
```bash
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
```
Record one of:
- `QMD refreshed: update + embed + verified`
- `QMD refreshed: update only + verified`
- `QMD skipped: QMD_WIKI_COLLECTION unset`
- `QMD skipped: qmd CLI unavailable`
- `QMD failed: <short error summary>`
@@ -0,0 +1,321 @@
# GitHub Copilot CLI Data Format — Detailed Reference
## Session-State Directory
`~/.copilot/session-state/` contains one directory per session the user has run with GitHub Copilot CLI. Each directory is named with a UUID.
### `workspace.yaml`
Minimal session metadata file, always present:
```yaml
id: <session-uuid>
cwd: /path/to/project
summary_count: 3
created_at: 2026-04-02T14:28:13.304Z
updated_at: 2026-04-29T12:00:00.000Z
```
`summary_count` reflects how many checkpoints were written. Sessions with `summary_count: 0` were either very short or completed without checkpointing — check `events.jsonl` for content anyway.
### `vscode.metadata.json`
VS Code context, written when the session is associated with a VS Code workspace:
```json
{
"workspaceFolder": {
"folderPath": "c:\\Users\\name\\git\\my-project",
"timestamp": 1773245818098
},
"writtenToDisc": true,
"repositoryProperties": {
"repositoryPath": "c:\\Users\\name\\git\\my-project",
"branchName": "feature/my-branch",
"baseBranchName": "origin/main"
},
"customTitle": "User-written session label or system-set title"
}
```
`customTitle` is the most human-readable session label — use it as a heading when creating session-derived wiki content. May be absent on older sessions.
### `events.jsonl`
The full event log for one session. Each line is a JSON object representing one event in the session.
#### Event: `session.start`
```json
{
"type": "session.start",
"data": {
"sessionId": "09371a50-9a50-484a-8743-5c696de1623a",
"version": 1,
"producer": "copilot-agent",
"copilotVersion": "0.0.420",
"startTime": "2026-03-02T15:10:04.678Z",
"context": {
"cwd": "C:\\Users\\name\\git\\my-project",
"gitRoot": "C:\\Users\\name\\git\\my-project",
"branch": "master"
}
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:04.817Z",
"parentId": null
}
```
`data.context.cwd` and `data.context.branch` establish the project context. Always read `session.start` first.
#### Event: `user.message`
```json
{
"type": "user.message",
"data": {
"content": "review my staged but uncommitted changes for issues",
"transformedContent": "<current_datetime>...</current_datetime>\n\nreview my staged...",
"attachments": [],
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94"
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:45.058Z",
"parentId": "<parent-event-uuid>"
}
```
Use `data.content` (not `data.transformedContent`) — the transformed version includes injected system context that's noise for wiki purposes.
#### Event: `assistant.message`
```json
{
"type": "assistant.message",
"data": {
"messageId": "<uuid>",
"content": "I'll review the staged changes in those three files.",
"toolRequests": [
{
"toolCallId": "tooluse_...",
"name": "report_intent",
"arguments": { "intent": "Reviewing staged changes" },
"type": "function"
},
{
"toolCallId": "tooluse_...",
"name": "powershell",
"arguments": {
"command": "git --no-pager diff --cached --stat",
"description": "Show staged diff"
},
"type": "function"
}
],
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94",
"reasoningOpaque": "<base64-encrypted-reasoning>",
"reasoningText": "The user wants me to review staged git changes..."
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:50.235Z",
"parentId": "<parent-event-uuid>"
}
```
**Extraction strategy:**
- Extract `data.content` — the assistant's visible text response
- `data.toolRequests` — skim tool names and description arguments for file/command patterns; ignore `report_intent` calls
- **Skip `data.reasoningOpaque` entirely** — encrypted/encoded internal reasoning
- **Skip `data.reasoningText` entirely** — decrypted reasoning; internal only, never user-visible
#### Event: `assistant.turn_start`
```json
{
"type": "assistant.turn_start",
"data": { "turnId": "0", "interactionId": "..." },
"id": "...",
"timestamp": "..."
}
```
Marks the start of an assistant turn. Useful for turn boundary detection; no content to extract.
#### Event: `tool.execution_start`
```json
{
"type": "tool.execution_start",
"data": {
"toolCallId": "tooluse_...",
"toolName": "powershell",
"arguments": { "command": "dotnet build ...", "description": "Build project" }
},
"id": "...",
"timestamp": "..."
}
```
Reveals what tools (file reads, commands, searches) were invoked. File-related tools (`view`, `edit`, `create`) with their paths are worth noting for the `session_files` equivalent when reading events directly.
#### Event: `tool.execution_end`
Contains the raw tool output. Usually noise — skip unless diagnosing errors.
### `checkpoints/<uuid>.json`
Mid-session progress summaries, written automatically as the session progresses:
```json
{
"title": "Implementing auth module",
"overview": "Working on JWT authentication for the API...",
"history": "1. Analyzed existing auth code\n2. Created IAuthService...",
"work_done": "- Created IAuthService interface\n- Implemented JwtAuthService",
"technical_details": "Uses RS256 signing. Token expiry configurable via settings...",
"important_files": "- src/Auth/IAuthService.cs\n- src/Auth/JwtAuthService.cs",
"next_steps": "- Wire up to DI container\n- Add refresh token support"
}
```
This is the highest-value structured content in the per-session directory — equivalent to Claude's memory files.
### `index.md`
Session-end summary written as a markdown file. Typically 13 paragraphs summarizing what was accomplished. Content varies by session length and complexity. Read this before opening `events.jsonl` to decide if the session is worth deep-processing.
---
## Global Session Store (`session-store.db`)
SQLite database at `~/.copilot/session-store.db`. The canonical cross-session record.
### Schema
#### `sessions`
| Column | Type | Notes |
| ----------- | ---- | ------------------------------------------ |
| `id` | TEXT | Session UUID (PK) |
| `cwd` | TEXT | Working directory |
| `repository`| TEXT | `owner/repo` format when available |
| `branch` | TEXT | Git branch name |
| `summary` | TEXT | One-paragraph session summary |
| `created_at`| TEXT | ISO 8601 timestamp |
| `updated_at`| TEXT | ISO 8601 timestamp — use for delta checks |
| `host_type` | TEXT | `"vscode"`, `"cli"`, or similar |
#### `turns`
| Column | Type | Notes |
| ------------------- | ------- | ------------------------------ |
| `id` | INTEGER | PK |
| `session_id` | TEXT | FK → `sessions.id` |
| `turn_index` | INTEGER | 0-based turn sequence |
| `user_message` | TEXT | Raw user message |
| `assistant_response`| TEXT | Assistant's text response |
| `timestamp` | TEXT | ISO 8601 timestamp |
Note: `user_message` here is the pre-transformation content — use this, not `transformedContent` from `events.jsonl`.
#### `checkpoints`
| Column | Type | Notes |
| ------------------- | ------- | ------------------------------ |
| `id` | INTEGER | PK |
| `session_id` | TEXT | FK → `sessions.id` |
| `checkpoint_number` | INTEGER | 1-based |
| `title` | TEXT | Short title |
| `overview` | TEXT | High-level summary |
| `history` | TEXT | Step-by-step of what happened |
| `work_done` | TEXT | Completed items |
| `technical_details` | TEXT | Implementation specifics |
| `important_files` | TEXT | Key files touched |
| `next_steps` | TEXT | Open threads |
| `created_at` | TEXT | ISO 8601 timestamp |
#### `session_files`
| Column | Type | Notes |
| -------------- | ------- | -------------------------------------------------- |
| `session_id` | TEXT | FK → `sessions.id` |
| `file_path` | TEXT | Absolute path to the file |
| `tool_name` | TEXT | `"edit"`, `"create"`, `"view"`, etc. |
| `turn_index` | INTEGER | Which turn touched the file |
| `first_seen_at`| TEXT | ISO 8601 timestamp |
> ⚠️ No `id` column — use `COUNT(DISTINCT sf.file_path)` not `COUNT(DISTINCT sf.id)`.
Aggregate by `file_path` across sessions to identify architecturally important files.
#### `session_refs`
| Column | Type | Notes |
| ------------ | ------- | ------------------------------------------ |
| `id` | INTEGER | PK |
| `session_id` | TEXT | FK → `sessions.id` |
| `ref_type` | TEXT | `"commit"`, `"pr"`, `"issue"` |
| `ref_value` | TEXT | Commit SHA, PR number, issue number |
| `turn_index` | INTEGER | Which turn referenced it |
| `created_at` | TEXT | ISO 8601 timestamp |
#### `search_index` (FTS5)
Full-text search index. Use for keyword discovery when surveying a large history:
```sql
SELECT content, session_id, source_type
FROM search_index
WHERE search_index MATCH 'auth OR authentication OR login'
LIMIT 20;
```
`source_type` values: `"turn"`, `"checkpoint_overview"`, `"checkpoint_history"`, `"checkpoint_work_done"`, `"checkpoint_technical"`, `"checkpoint_files"`, `"checkpoint_next_steps"`, `"workspace_artifact"`.
---
## VS Code Workspace Storage
### Location
The `workspaceStorage` directory is platform-specific:
| Platform | Default path |
| -------- | ------------------------------------------------------------------ |
| Windows | `%APPDATA%\Code\User\workspaceStorage\` |
| macOS | `~/Library/Application Support/Code/User/workspaceStorage/` |
| Linux | `~/.config/Code/User/workspaceStorage/` |
Each `<hash>/` subdirectory corresponds to a specific workspace (VS Code folder). The hash is derived from the workspace path — there is no human-readable mapping, so enumerate all `<hash>/GitHub.copilot-chat/` directories and use the `transcripts/` JSONL files' `session.start` events to identify which project each belongs to.
### Transcript JSONL (`transcripts/<uuid>.jsonl`)
Identical format to `events.jsonl` from Source 1. Parse using the same event type handlers. The `session.start` event's `data.context.cwd` tells you which project this belongs to.
### Memory Artifacts (`memory-tool/memories/<base64-session-id>/`)
Directory name is the session UUID encoded as base64. Files inside are markdown documents explicitly saved by the user or system during the session — typically `plan.md` containing the session plan.
Decode the directory name to link it to a session:
```python
import base64
# Pad to multiple of 4 before decoding
session_id = base64.b64decode(dir_name + '==').decode('utf-8')
```
---
## Processing Order
For maximum efficiency and signal-to-noise:
1. **`session-store.db` checkpoints** — Fastest, highest signal. Query all at once.
2. **`session-store.db` sessions.summary** — One-paragraph synopsis per session.
3. **Per-session `checkpoints/*.json` + `index.md`** — For sessions not yet in `session-store.db` or for additional detail.
4. **Memory artifacts** (`memory-tool/memories/`) — User-authored, high quality.
5. **`session-store.db` turns** — Full conversation, process selectively by topic.
6. **`events.jsonl` / transcript JSONL** — Only if `session-store.db` is absent or incomplete.
7. **`session_files` / `session_refs`** — For file pattern and git linkage metadata.