add skills
This commit is contained in:
@@ -0,0 +1,373 @@
|
||||
---
|
||||
name: copilot-history-ingest
|
||||
description: >
|
||||
Ingest GitHub Copilot CLI session history into an Obsidian wiki as distilled knowledge pages. Use this skill
|
||||
when the user wants to capture their Copilot CLI sessions into a personal wiki — extracting architecture
|
||||
decisions, debug notes, and patterns into searchable Obsidian pages. Triggers on phrases like "ingest my
|
||||
copilot sessions into obsidian", "add my copilot history to my wiki", "pull my copilot session history into
|
||||
the vault", "capture what I've learned from copilot into obsidian", "just the new sessions since last time",
|
||||
or "mine patterns across my copilot sessions". Also triggers when the user mentions session-store.db,
|
||||
~/.copilot/session-state, or VS Code copilot-chat transcripts in the context of building a wiki or knowledge
|
||||
base. Does NOT trigger for general copilot usage questions, searching sessions, or backing up history.
|
||||
---
|
||||
|
||||
# Copilot History Ingest — Conversation Mining
|
||||
|
||||
You are extracting knowledge from the user's past GitHub Copilot CLI conversations and distilling it into the Obsidian wiki. Conversations are rich but messy — your job is to find the signal and compile it.
|
||||
|
||||
This skill can be invoked directly or via the `wiki-history-ingest` router (`/wiki-history-ingest copilot`).
|
||||
|
||||
## Before You Start
|
||||
|
||||
1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH`, `COPILOT_HISTORY_PATH` (defaults to `~/.copilot/session-state`), and `COPILOT_VSCODE_STORAGE_PATH` (VS Code `workspaceStorage`; platform-specific — ask the user if absent)
|
||||
2. Read `.manifest.json` at the vault root to check what's already been ingested
|
||||
3. Read `index.md` at the vault root to know what the wiki already contains
|
||||
|
||||
## Ingest Modes
|
||||
|
||||
### Append Mode (default)
|
||||
|
||||
Check `.manifest.json` for each source file (events JSONL, transcript JSONL, checkpoint, session-store DB). Only process:
|
||||
|
||||
- Sessions not in the manifest (new sessions)
|
||||
- Sessions whose `updated_at` is newer than their `ingested_at` in the manifest
|
||||
|
||||
This is usually what you want — the user ran a few new sessions and wants to capture the delta.
|
||||
|
||||
### Full Mode
|
||||
|
||||
Process everything regardless of manifest. Use after a `wiki-rebuild` or if the user explicitly asks.
|
||||
|
||||
## GitHub Copilot Data Layout
|
||||
|
||||
Copilot stores data in three locations. Scan **all three**.
|
||||
|
||||
### Source 1: `~/.copilot/session-state/` (CLI sessions)
|
||||
|
||||
```
|
||||
~/.copilot/session-state/
|
||||
├── <session-uuid>/
|
||||
│ ├── workspace.yaml # Session metadata (id, cwd, summary_count, created_at, updated_at)
|
||||
│ ├── vscode.metadata.json # VS Code context (workspaceFolder, repositoryProperties, customTitle)
|
||||
│ ├── events.jsonl # Full event log — all turns, tool calls, reasoning
|
||||
│ ├── session.db # Per-session SQLite (todos/todo_deps only — skip for ingestion)
|
||||
│ ├── index.md # Session summary written at session end
|
||||
│ ├── checkpoints/ # Checkpoint JSON files (mid-session summaries)
|
||||
│ │ └── <uuid>.json # title, overview, history, work_done, technical_details,
|
||||
│ │ # important_files, next_steps
|
||||
│ ├── files/ # Artifacts produced during session (plans, diagrams, etc.)
|
||||
│ └── research/ # Research artifacts
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Source 2: `~/.copilot/session-store.db` (Global SQLite)
|
||||
|
||||
The canonical cross-session database. This is the **highest-value** source: structured, queryable, and pre-summarised.
|
||||
|
||||
```
|
||||
sessions — id, cwd, repository, branch, summary, created_at, updated_at, host_type
|
||||
turns — session_id, turn_index, user_message, assistant_response, timestamp
|
||||
checkpoints — session_id, checkpoint_number, title, overview, history, work_done,
|
||||
technical_details, important_files, next_steps, created_at
|
||||
session_files — session_id, file_path, tool_name, turn_index, first_seen_at
|
||||
session_refs — session_id, ref_type (commit/pr/issue), ref_value, turn_index, created_at
|
||||
search_index — FTS5 virtual table (content, session_id, source_type, source_id)
|
||||
```
|
||||
|
||||
### Source 3: VS Code Workspace Storage (`<workspaceStorage>/<hash>/GitHub.copilot-chat/`)
|
||||
|
||||
VS Code extension data, keyed by workspace hash. The path is platform-specific and must come from `.env` or user input.
|
||||
|
||||
```
|
||||
<hash>/GitHub.copilot-chat/
|
||||
├── transcripts/
|
||||
│ └── <session-uuid>.jsonl # Conversation transcripts (same JSONL format as events.jsonl)
|
||||
├── memory-tool/
|
||||
│ └── memories/
|
||||
│ └── <base64-session-id>/ # Per-session saved artifacts (plan.md, etc.)
|
||||
│ └── plan.md
|
||||
└── codebase-external.sqlite # Codebase index (skip — no conversation knowledge)
|
||||
```
|
||||
|
||||
### Key data sources ranked by value:
|
||||
|
||||
1. **Checkpoints** (`session-store.db` `checkpoints` table + per-session `checkpoints/*.json`) — Pre-distilled summaries with `overview`, `work_done`, `technical_details`, `important_files`, `next_steps`. Gold.
|
||||
2. **Session summaries** (`session-store.db` `sessions.summary` + `index.md`) — One-paragraph synopsis per session.
|
||||
3. **Turns** (`session-store.db` `turns` table + `events.jsonl` / transcript JSONL) — Full conversation. Rich but verbose.
|
||||
4. **Memory artifacts** (`memory-tool/memories/<id>/plan.md` etc.) — Pre-written plans and structured notes the user saved explicitly. Worth importing verbatim (or lightly summarised).
|
||||
5. **File access patterns** (`session_files` table + `tool.execution_*` events) — Which files the agent repeatedly touched — reveals high-value project files.
|
||||
6. **Session refs** (`session_refs` table) — Commits, PRs, and issues linked to sessions.
|
||||
7. **`vscode.metadata.json`** — Workspace folder path, branch, `customTitle` (user-set session label). Useful for grouping and naming.
|
||||
|
||||
## Step 1: Survey and Compute Delta
|
||||
|
||||
Scan all three data locations and compare against `.manifest.json`:
|
||||
|
||||
```bash
|
||||
# --- Source 1: per-session directories ---
|
||||
# Find all session directories (each has workspace.yaml)
|
||||
ls ~/.copilot/session-state/
|
||||
|
||||
# For each session, read workspace.yaml for id/cwd/updated_at
|
||||
# and vscode.metadata.json for customTitle / repositoryProperties
|
||||
|
||||
# --- Source 2: global database ---
|
||||
# Query session-store.db with sqlite3 (or Python sqlite3)
|
||||
SELECT s.id, s.cwd, s.repository, s.branch, s.summary, s.updated_at,
|
||||
COUNT(DISTINCT t.turn_index) AS turn_count,
|
||||
COUNT(DISTINCT c.id) AS checkpoint_count
|
||||
FROM sessions s
|
||||
LEFT JOIN turns t ON t.session_id = s.id
|
||||
LEFT JOIN checkpoints c ON c.session_id = s.id
|
||||
GROUP BY s.id
|
||||
ORDER BY s.updated_at DESC;
|
||||
|
||||
# --- Source 3: VS Code workspace storage ---
|
||||
# For each <hash> directory under workspaceStorage, check for GitHub.copilot-chat/
|
||||
# Find transcript files
|
||||
ls <workspaceStorage>/<hash>/GitHub.copilot-chat/transcripts/
|
||||
```
|
||||
|
||||
Build a unified inventory — one entry per session UUID — and classify:
|
||||
|
||||
- **New** — not in manifest → needs ingesting
|
||||
- **Modified** — in manifest but `updated_at` is newer → needs re-ingesting
|
||||
- **Unchanged** — in manifest and not modified → skip in append mode
|
||||
|
||||
Report to the user: "Found X sessions in session-state, Y in session-store.db, Z VS Code transcript files. Checkpoints: A. Delta: B new, C modified."
|
||||
|
||||
## Step 2: Ingest Checkpoints and Summaries First
|
||||
|
||||
Checkpoints are already distilled — process them before touching raw turns.
|
||||
|
||||
### From `session-store.db`:
|
||||
|
||||
```sql
|
||||
SELECT s.id, s.cwd, s.repository, s.branch, s.summary,
|
||||
c.checkpoint_number, c.title, c.overview, c.work_done,
|
||||
c.technical_details, c.important_files, c.next_steps,
|
||||
c.created_at
|
||||
FROM checkpoints c
|
||||
JOIN sessions s ON c.session_id = s.id
|
||||
ORDER BY s.updated_at DESC, c.checkpoint_number ASC;
|
||||
```
|
||||
|
||||
### From per-session `checkpoints/*.json`:
|
||||
|
||||
Each checkpoint file has: `title`, `overview`, `history`, `work_done`, `technical_details`, `important_files`, `next_steps`.
|
||||
|
||||
Read `index.md` (if present) as a session-level summary — it's typically written at session end and is already concise.
|
||||
|
||||
### What to extract:
|
||||
|
||||
- `overview` → high-level description of what the session accomplished
|
||||
- `work_done` → concrete tasks completed (good for skills / project pages)
|
||||
- `technical_details` → implementation specifics (good for concepts pages)
|
||||
- `important_files` → high-value files in the project (good for project pages)
|
||||
- `next_steps` → open threads (good for linking to ongoing project work)
|
||||
|
||||
## Step 3: Parse Session Turns
|
||||
|
||||
Read turns from `session-store.db` (preferred — already parsed) or from `events.jsonl` / transcript JSONL.
|
||||
|
||||
### From `session-store.db`:
|
||||
|
||||
```sql
|
||||
SELECT turn_index, user_message, assistant_response, timestamp
|
||||
FROM turns
|
||||
WHERE session_id = '<uuid>'
|
||||
ORDER BY turn_index ASC;
|
||||
```
|
||||
|
||||
### From `events.jsonl` / transcript JSONL:
|
||||
|
||||
Each file is one session. Each line is a JSON event. See `references/copilot-data-format.md` for the full schema.
|
||||
|
||||
**Relevant event types:**
|
||||
|
||||
| `type` | What it is | Worth reading? |
|
||||
| --------------------- | --------------------------------------- | ----------------------------------------- |
|
||||
| `session.start` | Session metadata (cwd, branch, version) | Yes — establishes project context |
|
||||
| `user.message` | User turn | Yes — `data.content` |
|
||||
| `assistant.message` | Assistant turn | Yes — `data.content` (text) + `data.toolRequests` |
|
||||
| `tool.execution_start`| Tool call | Skim — reveals what files/commands were used |
|
||||
| `tool.execution_end` | Tool result | No — usually noise |
|
||||
|
||||
**Extraction strategy for `assistant.message`:**
|
||||
|
||||
- `data.content` is the assistant's text response — extract this
|
||||
- `data.reasoningText` is internal reasoning — skip (it's the unpacked `reasoningOpaque` field)
|
||||
- `data.toolRequests` lists tool calls — skim tool names and arguments for file access patterns
|
||||
- Skip `type: "tool.execution_end"` entirely
|
||||
|
||||
## Step 3b: Process Memory Artifacts
|
||||
|
||||
For each session that has a `memory-tool/memories/<base64-id>/` directory in VS Code workspace storage, read any markdown files saved there (typically `plan.md`). These are documents the user explicitly saved — treat them as high-quality, user-authored content.
|
||||
|
||||
Decode the base64 directory name to get the session UUID:
|
||||
|
||||
```python
|
||||
import base64
|
||||
session_id = base64.b64decode(dir_name).decode('utf-8')
|
||||
```
|
||||
|
||||
Memory artifacts map to project `skills/` or `concepts/` pages, depending on content type.
|
||||
|
||||
## Step 3c: Extract File and Ref Patterns
|
||||
|
||||
From `session-store.db`:
|
||||
|
||||
```sql
|
||||
-- Most-touched files per project
|
||||
SELECT repository, file_path, COUNT(*) AS touch_count
|
||||
FROM session_files
|
||||
GROUP BY repository, file_path
|
||||
ORDER BY touch_count DESC;
|
||||
|
||||
-- Linked commits/PRs/issues per session
|
||||
SELECT session_id, ref_type, ref_value, turn_index
|
||||
FROM session_refs
|
||||
ORDER BY session_id, turn_index;
|
||||
```
|
||||
|
||||
**File access patterns** reveal which files are architecturally important — note them on project pages.
|
||||
|
||||
**Session refs** link Copilot sessions to git history — useful for connecting wiki knowledge to concrete code changes.
|
||||
|
||||
## Step 4: Cluster by Topic
|
||||
|
||||
Don't create one wiki page per session. Instead:
|
||||
|
||||
- Group extracted knowledge **by topic** across sessions
|
||||
- A single session about "debugging auth + setting up CI" → two separate topics
|
||||
- Three sessions across different days about "React performance" → one merged topic
|
||||
- `cwd` / `repository` give you a natural first-level grouping; `vscode.metadata.json`'s `customTitle` gives a human-readable session label
|
||||
|
||||
## Step 5: Distill into Wiki Pages
|
||||
|
||||
Each Copilot project maps to a project directory in the vault. Derive the project name from `cwd` or `repository`:
|
||||
|
||||
```
|
||||
C:\Users\name\git\my-project → my-project
|
||||
/Users/name/code/another-app → another-app
|
||||
```
|
||||
|
||||
Prefer `repository` (e.g., `owner/repo`) from `session-store.db` over raw `cwd` when available.
|
||||
|
||||
### Project-specific vs. global knowledge
|
||||
|
||||
| What you found | Where it goes | Example |
|
||||
| ----------------------------------- | --------------------------- | ---------------------------------------------------- |
|
||||
| Project architecture decisions | `projects/<name>/concepts/` | `projects/my-project/concepts/main-architecture.md` |
|
||||
| Project-specific debugging patterns | `projects/<name>/skills/` | `projects/my-project/skills/api-rate-limiting.md` |
|
||||
| General concept the user learned | `concepts/` (global) | `concepts/react-server-components.md` |
|
||||
| Recurring problem across projects | `skills/` (global) | `skills/debugging-hydration-errors.md` |
|
||||
| A tool/service used | `entities/` (global) | `entities/vercel-functions.md` |
|
||||
| Patterns across many sessions | `synthesis/` (global) | `synthesis/common-debugging-patterns.md` |
|
||||
|
||||
For each project with content, create or update the project overview page at `projects/<name>/<name>.md` — **named after the project, not `_project.md`**. Obsidian's graph view uses the filename as the node label, so `_project.md` makes every project show up as `_project` in the graph. Naming it `<name>.md` gives each project a distinct, readable node name.
|
||||
|
||||
**Important:** Distill the _knowledge_, not the conversation. Don't write "In a session on March 15, the user asked about X." Write the knowledge itself, with the session as a source attribution.
|
||||
|
||||
**Write a `summary:` frontmatter field** on every new/updated page — 1–2 sentences, ≤200 chars, answering "what is this page about?" for a reader who hasn't opened it. `wiki-query`'s cheap retrieval path reads this field to avoid opening page bodies.
|
||||
|
||||
**Add confidence and lifecycle fields** to every new page's frontmatter:
|
||||
```yaml
|
||||
base_confidence: 0.42
|
||||
lifecycle: draft
|
||||
lifecycle_changed: <ISO date today>
|
||||
```
|
||||
Leave `lifecycle` unchanged on update.
|
||||
|
||||
**Mark provenance** per the convention in `llm-wiki` (Provenance Markers section):
|
||||
|
||||
- **Checkpoints and index.md** are pre-distilled by the system — treat checkpoint-derived claims as extracted (the system wrote them from observed actions).
|
||||
- **Memory artifacts** are user-authored — treat as extracted.
|
||||
- **Conversation turn distillation** is mostly inferred. You're synthesizing a coherent claim from many turns. Apply `^[inferred]` liberally to synthesized patterns, generalizations across sessions, and "what the user really meant" interpretations.
|
||||
- Use `^[ambiguous]` when the user changed direction mid-session or when the session ended unresolved.
|
||||
- Write a `provenance:` frontmatter block on every new/updated page summarizing the rough mix.
|
||||
|
||||
## Step 6: Update Manifest, Journal, and Special Files
|
||||
|
||||
### Update `.manifest.json`
|
||||
|
||||
For each session processed, add/update its entry with:
|
||||
|
||||
- `ingested_at`, `session_id`, `updated_at`
|
||||
- `source_type`: one of `"copilot_session"`, `"copilot_checkpoint"`, `"copilot_transcript"`, `"copilot_memory_artifact"`
|
||||
- `project`: the decoded project name
|
||||
- `pages_created` and `pages_updated` lists
|
||||
|
||||
Also update the `projects` section of the manifest:
|
||||
|
||||
```json
|
||||
{
|
||||
"project-name": {
|
||||
"repository": "owner/repo",
|
||||
"cwd": "C:\\Users\\name\\git\\project-name",
|
||||
"vault_path": "projects/project-name",
|
||||
"last_ingested": "TIMESTAMP",
|
||||
"sessions_ingested": 5,
|
||||
"sessions_total": 8,
|
||||
"checkpoints_ingested": 12,
|
||||
"memory_artifacts_ingested": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Create journal entry + update special files
|
||||
|
||||
Update `index.md` and `log.md` per the standard process:
|
||||
|
||||
```
|
||||
- [TIMESTAMP] COPILOT_HISTORY_INGEST projects=N sessions=M checkpoints=C pages_updated=X pages_created=Y mode=append|full
|
||||
```
|
||||
|
||||
**`hot.md`** — Read `$OBSIDIAN_VAULT_PATH/hot.md` (create from the template in `wiki-ingest` if missing). Update **Recent Activity** with a one-line summary — e.g. "Ingested 5 Copilot sessions across 2 projects; surfaced patterns in API design and testing strategy." Keep the last 3 operations. Update **Active Threads** if any ongoing project is now better understood. Update `updated` timestamp.
|
||||
|
||||
## Privacy
|
||||
|
||||
- Distill and synthesize — don't copy raw conversation text verbatim
|
||||
- Skip anything that looks like secrets, API keys, passwords, tokens
|
||||
- `data.reasoningOpaque` / `data.reasoningText` in assistant events is internal reasoning — skip entirely, never copy to wiki
|
||||
- If you encounter personal/sensitive content, ask the user before including it
|
||||
- The user's conversations may reference other people — be thoughtful about what goes in the wiki
|
||||
|
||||
## Reference
|
||||
|
||||
See `references/copilot-data-format.md` for detailed data structure documentation.
|
||||
|
||||
## QMD Refresh After Vault Writes
|
||||
|
||||
QMD is a search index, not the source of truth. If `$QMD_WIKI_COLLECTION` is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
|
||||
|
||||
Use `$QMD_CLI` if set; otherwise use `qmd`.
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} update
|
||||
```
|
||||
|
||||
If the output says vectors are needed or embeddings may be stale, run:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} embed
|
||||
```
|
||||
|
||||
Verify the collection with either:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
|
||||
```
|
||||
|
||||
or, when a specific page path is known:
|
||||
|
||||
```bash
|
||||
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
|
||||
```
|
||||
|
||||
Record one of:
|
||||
- `QMD refreshed: update + embed + verified`
|
||||
- `QMD refreshed: update only + verified`
|
||||
- `QMD skipped: QMD_WIKI_COLLECTION unset`
|
||||
- `QMD skipped: qmd CLI unavailable`
|
||||
- `QMD failed: <short error summary>`
|
||||
@@ -0,0 +1,321 @@
|
||||
# GitHub Copilot CLI Data Format — Detailed Reference
|
||||
|
||||
## Session-State Directory
|
||||
|
||||
`~/.copilot/session-state/` contains one directory per session the user has run with GitHub Copilot CLI. Each directory is named with a UUID.
|
||||
|
||||
### `workspace.yaml`
|
||||
|
||||
Minimal session metadata file, always present:
|
||||
|
||||
```yaml
|
||||
id: <session-uuid>
|
||||
cwd: /path/to/project
|
||||
summary_count: 3
|
||||
created_at: 2026-04-02T14:28:13.304Z
|
||||
updated_at: 2026-04-29T12:00:00.000Z
|
||||
```
|
||||
|
||||
`summary_count` reflects how many checkpoints were written. Sessions with `summary_count: 0` were either very short or completed without checkpointing — check `events.jsonl` for content anyway.
|
||||
|
||||
### `vscode.metadata.json`
|
||||
|
||||
VS Code context, written when the session is associated with a VS Code workspace:
|
||||
|
||||
```json
|
||||
{
|
||||
"workspaceFolder": {
|
||||
"folderPath": "c:\\Users\\name\\git\\my-project",
|
||||
"timestamp": 1773245818098
|
||||
},
|
||||
"writtenToDisc": true,
|
||||
"repositoryProperties": {
|
||||
"repositoryPath": "c:\\Users\\name\\git\\my-project",
|
||||
"branchName": "feature/my-branch",
|
||||
"baseBranchName": "origin/main"
|
||||
},
|
||||
"customTitle": "User-written session label or system-set title"
|
||||
}
|
||||
```
|
||||
|
||||
`customTitle` is the most human-readable session label — use it as a heading when creating session-derived wiki content. May be absent on older sessions.
|
||||
|
||||
### `events.jsonl`
|
||||
|
||||
The full event log for one session. Each line is a JSON object representing one event in the session.
|
||||
|
||||
#### Event: `session.start`
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "session.start",
|
||||
"data": {
|
||||
"sessionId": "09371a50-9a50-484a-8743-5c696de1623a",
|
||||
"version": 1,
|
||||
"producer": "copilot-agent",
|
||||
"copilotVersion": "0.0.420",
|
||||
"startTime": "2026-03-02T15:10:04.678Z",
|
||||
"context": {
|
||||
"cwd": "C:\\Users\\name\\git\\my-project",
|
||||
"gitRoot": "C:\\Users\\name\\git\\my-project",
|
||||
"branch": "master"
|
||||
}
|
||||
},
|
||||
"id": "<event-uuid>",
|
||||
"timestamp": "2026-03-02T15:10:04.817Z",
|
||||
"parentId": null
|
||||
}
|
||||
```
|
||||
|
||||
`data.context.cwd` and `data.context.branch` establish the project context. Always read `session.start` first.
|
||||
|
||||
#### Event: `user.message`
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "user.message",
|
||||
"data": {
|
||||
"content": "review my staged but uncommitted changes for issues",
|
||||
"transformedContent": "<current_datetime>...</current_datetime>\n\nreview my staged...",
|
||||
"attachments": [],
|
||||
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94"
|
||||
},
|
||||
"id": "<event-uuid>",
|
||||
"timestamp": "2026-03-02T15:10:45.058Z",
|
||||
"parentId": "<parent-event-uuid>"
|
||||
}
|
||||
```
|
||||
|
||||
Use `data.content` (not `data.transformedContent`) — the transformed version includes injected system context that's noise for wiki purposes.
|
||||
|
||||
#### Event: `assistant.message`
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "assistant.message",
|
||||
"data": {
|
||||
"messageId": "<uuid>",
|
||||
"content": "I'll review the staged changes in those three files.",
|
||||
"toolRequests": [
|
||||
{
|
||||
"toolCallId": "tooluse_...",
|
||||
"name": "report_intent",
|
||||
"arguments": { "intent": "Reviewing staged changes" },
|
||||
"type": "function"
|
||||
},
|
||||
{
|
||||
"toolCallId": "tooluse_...",
|
||||
"name": "powershell",
|
||||
"arguments": {
|
||||
"command": "git --no-pager diff --cached --stat",
|
||||
"description": "Show staged diff"
|
||||
},
|
||||
"type": "function"
|
||||
}
|
||||
],
|
||||
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94",
|
||||
"reasoningOpaque": "<base64-encrypted-reasoning>",
|
||||
"reasoningText": "The user wants me to review staged git changes..."
|
||||
},
|
||||
"id": "<event-uuid>",
|
||||
"timestamp": "2026-03-02T15:10:50.235Z",
|
||||
"parentId": "<parent-event-uuid>"
|
||||
}
|
||||
```
|
||||
|
||||
**Extraction strategy:**
|
||||
|
||||
- Extract `data.content` — the assistant's visible text response
|
||||
- `data.toolRequests` — skim tool names and description arguments for file/command patterns; ignore `report_intent` calls
|
||||
- **Skip `data.reasoningOpaque` entirely** — encrypted/encoded internal reasoning
|
||||
- **Skip `data.reasoningText` entirely** — decrypted reasoning; internal only, never user-visible
|
||||
|
||||
#### Event: `assistant.turn_start`
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "assistant.turn_start",
|
||||
"data": { "turnId": "0", "interactionId": "..." },
|
||||
"id": "...",
|
||||
"timestamp": "..."
|
||||
}
|
||||
```
|
||||
|
||||
Marks the start of an assistant turn. Useful for turn boundary detection; no content to extract.
|
||||
|
||||
#### Event: `tool.execution_start`
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "tool.execution_start",
|
||||
"data": {
|
||||
"toolCallId": "tooluse_...",
|
||||
"toolName": "powershell",
|
||||
"arguments": { "command": "dotnet build ...", "description": "Build project" }
|
||||
},
|
||||
"id": "...",
|
||||
"timestamp": "..."
|
||||
}
|
||||
```
|
||||
|
||||
Reveals what tools (file reads, commands, searches) were invoked. File-related tools (`view`, `edit`, `create`) with their paths are worth noting for the `session_files` equivalent when reading events directly.
|
||||
|
||||
#### Event: `tool.execution_end`
|
||||
|
||||
Contains the raw tool output. Usually noise — skip unless diagnosing errors.
|
||||
|
||||
### `checkpoints/<uuid>.json`
|
||||
|
||||
Mid-session progress summaries, written automatically as the session progresses:
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Implementing auth module",
|
||||
"overview": "Working on JWT authentication for the API...",
|
||||
"history": "1. Analyzed existing auth code\n2. Created IAuthService...",
|
||||
"work_done": "- Created IAuthService interface\n- Implemented JwtAuthService",
|
||||
"technical_details": "Uses RS256 signing. Token expiry configurable via settings...",
|
||||
"important_files": "- src/Auth/IAuthService.cs\n- src/Auth/JwtAuthService.cs",
|
||||
"next_steps": "- Wire up to DI container\n- Add refresh token support"
|
||||
}
|
||||
```
|
||||
|
||||
This is the highest-value structured content in the per-session directory — equivalent to Claude's memory files.
|
||||
|
||||
### `index.md`
|
||||
|
||||
Session-end summary written as a markdown file. Typically 1–3 paragraphs summarizing what was accomplished. Content varies by session length and complexity. Read this before opening `events.jsonl` to decide if the session is worth deep-processing.
|
||||
|
||||
---
|
||||
|
||||
## Global Session Store (`session-store.db`)
|
||||
|
||||
SQLite database at `~/.copilot/session-store.db`. The canonical cross-session record.
|
||||
|
||||
### Schema
|
||||
|
||||
#### `sessions`
|
||||
|
||||
| Column | Type | Notes |
|
||||
| ----------- | ---- | ------------------------------------------ |
|
||||
| `id` | TEXT | Session UUID (PK) |
|
||||
| `cwd` | TEXT | Working directory |
|
||||
| `repository`| TEXT | `owner/repo` format when available |
|
||||
| `branch` | TEXT | Git branch name |
|
||||
| `summary` | TEXT | One-paragraph session summary |
|
||||
| `created_at`| TEXT | ISO 8601 timestamp |
|
||||
| `updated_at`| TEXT | ISO 8601 timestamp — use for delta checks |
|
||||
| `host_type` | TEXT | `"vscode"`, `"cli"`, or similar |
|
||||
|
||||
#### `turns`
|
||||
|
||||
| Column | Type | Notes |
|
||||
| ------------------- | ------- | ------------------------------ |
|
||||
| `id` | INTEGER | PK |
|
||||
| `session_id` | TEXT | FK → `sessions.id` |
|
||||
| `turn_index` | INTEGER | 0-based turn sequence |
|
||||
| `user_message` | TEXT | Raw user message |
|
||||
| `assistant_response`| TEXT | Assistant's text response |
|
||||
| `timestamp` | TEXT | ISO 8601 timestamp |
|
||||
|
||||
Note: `user_message` here is the pre-transformation content — use this, not `transformedContent` from `events.jsonl`.
|
||||
|
||||
#### `checkpoints`
|
||||
|
||||
| Column | Type | Notes |
|
||||
| ------------------- | ------- | ------------------------------ |
|
||||
| `id` | INTEGER | PK |
|
||||
| `session_id` | TEXT | FK → `sessions.id` |
|
||||
| `checkpoint_number` | INTEGER | 1-based |
|
||||
| `title` | TEXT | Short title |
|
||||
| `overview` | TEXT | High-level summary |
|
||||
| `history` | TEXT | Step-by-step of what happened |
|
||||
| `work_done` | TEXT | Completed items |
|
||||
| `technical_details` | TEXT | Implementation specifics |
|
||||
| `important_files` | TEXT | Key files touched |
|
||||
| `next_steps` | TEXT | Open threads |
|
||||
| `created_at` | TEXT | ISO 8601 timestamp |
|
||||
|
||||
#### `session_files`
|
||||
|
||||
| Column | Type | Notes |
|
||||
| -------------- | ------- | -------------------------------------------------- |
|
||||
| `session_id` | TEXT | FK → `sessions.id` |
|
||||
| `file_path` | TEXT | Absolute path to the file |
|
||||
| `tool_name` | TEXT | `"edit"`, `"create"`, `"view"`, etc. |
|
||||
| `turn_index` | INTEGER | Which turn touched the file |
|
||||
| `first_seen_at`| TEXT | ISO 8601 timestamp |
|
||||
|
||||
> ⚠️ No `id` column — use `COUNT(DISTINCT sf.file_path)` not `COUNT(DISTINCT sf.id)`.
|
||||
|
||||
Aggregate by `file_path` across sessions to identify architecturally important files.
|
||||
|
||||
#### `session_refs`
|
||||
|
||||
| Column | Type | Notes |
|
||||
| ------------ | ------- | ------------------------------------------ |
|
||||
| `id` | INTEGER | PK |
|
||||
| `session_id` | TEXT | FK → `sessions.id` |
|
||||
| `ref_type` | TEXT | `"commit"`, `"pr"`, `"issue"` |
|
||||
| `ref_value` | TEXT | Commit SHA, PR number, issue number |
|
||||
| `turn_index` | INTEGER | Which turn referenced it |
|
||||
| `created_at` | TEXT | ISO 8601 timestamp |
|
||||
|
||||
#### `search_index` (FTS5)
|
||||
|
||||
Full-text search index. Use for keyword discovery when surveying a large history:
|
||||
|
||||
```sql
|
||||
SELECT content, session_id, source_type
|
||||
FROM search_index
|
||||
WHERE search_index MATCH 'auth OR authentication OR login'
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
`source_type` values: `"turn"`, `"checkpoint_overview"`, `"checkpoint_history"`, `"checkpoint_work_done"`, `"checkpoint_technical"`, `"checkpoint_files"`, `"checkpoint_next_steps"`, `"workspace_artifact"`.
|
||||
|
||||
---
|
||||
|
||||
## VS Code Workspace Storage
|
||||
|
||||
### Location
|
||||
|
||||
The `workspaceStorage` directory is platform-specific:
|
||||
|
||||
| Platform | Default path |
|
||||
| -------- | ------------------------------------------------------------------ |
|
||||
| Windows | `%APPDATA%\Code\User\workspaceStorage\` |
|
||||
| macOS | `~/Library/Application Support/Code/User/workspaceStorage/` |
|
||||
| Linux | `~/.config/Code/User/workspaceStorage/` |
|
||||
|
||||
Each `<hash>/` subdirectory corresponds to a specific workspace (VS Code folder). The hash is derived from the workspace path — there is no human-readable mapping, so enumerate all `<hash>/GitHub.copilot-chat/` directories and use the `transcripts/` JSONL files' `session.start` events to identify which project each belongs to.
|
||||
|
||||
### Transcript JSONL (`transcripts/<uuid>.jsonl`)
|
||||
|
||||
Identical format to `events.jsonl` from Source 1. Parse using the same event type handlers. The `session.start` event's `data.context.cwd` tells you which project this belongs to.
|
||||
|
||||
### Memory Artifacts (`memory-tool/memories/<base64-session-id>/`)
|
||||
|
||||
Directory name is the session UUID encoded as base64. Files inside are markdown documents explicitly saved by the user or system during the session — typically `plan.md` containing the session plan.
|
||||
|
||||
Decode the directory name to link it to a session:
|
||||
|
||||
```python
|
||||
import base64
|
||||
# Pad to multiple of 4 before decoding
|
||||
session_id = base64.b64decode(dir_name + '==').decode('utf-8')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Processing Order
|
||||
|
||||
For maximum efficiency and signal-to-noise:
|
||||
|
||||
1. **`session-store.db` checkpoints** — Fastest, highest signal. Query all at once.
|
||||
2. **`session-store.db` sessions.summary** — One-paragraph synopsis per session.
|
||||
3. **Per-session `checkpoints/*.json` + `index.md`** — For sessions not yet in `session-store.db` or for additional detail.
|
||||
4. **Memory artifacts** (`memory-tool/memories/`) — User-authored, high quality.
|
||||
5. **`session-store.db` turns** — Full conversation, process selectively by topic.
|
||||
6. **`events.jsonl` / transcript JSONL** — Only if `session-store.db` is absent or incomplete.
|
||||
7. **`session_files` / `session_refs`** — For file pattern and git linkage metadata.
|
||||
Reference in New Issue
Block a user