12 KiB
GitHub Copilot CLI Data Format — Detailed Reference
Session-State Directory
~/.copilot/session-state/ contains one directory per session the user has run with GitHub Copilot CLI. Each directory is named with a UUID.
workspace.yaml
Minimal session metadata file, always present:
id: <session-uuid>
cwd: /path/to/project
summary_count: 3
created_at: 2026-04-02T14:28:13.304Z
updated_at: 2026-04-29T12:00:00.000Z
summary_count reflects how many checkpoints were written. Sessions with summary_count: 0 were either very short or completed without checkpointing — check events.jsonl for content anyway.
vscode.metadata.json
VS Code context, written when the session is associated with a VS Code workspace:
{
"workspaceFolder": {
"folderPath": "c:\\Users\\name\\git\\my-project",
"timestamp": 1773245818098
},
"writtenToDisc": true,
"repositoryProperties": {
"repositoryPath": "c:\\Users\\name\\git\\my-project",
"branchName": "feature/my-branch",
"baseBranchName": "origin/main"
},
"customTitle": "User-written session label or system-set title"
}
customTitle is the most human-readable session label — use it as a heading when creating session-derived wiki content. May be absent on older sessions.
events.jsonl
The full event log for one session. Each line is a JSON object representing one event in the session.
Event: session.start
{
"type": "session.start",
"data": {
"sessionId": "09371a50-9a50-484a-8743-5c696de1623a",
"version": 1,
"producer": "copilot-agent",
"copilotVersion": "0.0.420",
"startTime": "2026-03-02T15:10:04.678Z",
"context": {
"cwd": "C:\\Users\\name\\git\\my-project",
"gitRoot": "C:\\Users\\name\\git\\my-project",
"branch": "master"
}
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:04.817Z",
"parentId": null
}
data.context.cwd and data.context.branch establish the project context. Always read session.start first.
Event: user.message
{
"type": "user.message",
"data": {
"content": "review my staged but uncommitted changes for issues",
"transformedContent": "<current_datetime>...</current_datetime>\n\nreview my staged...",
"attachments": [],
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94"
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:45.058Z",
"parentId": "<parent-event-uuid>"
}
Use data.content (not data.transformedContent) — the transformed version includes injected system context that's noise for wiki purposes.
Event: assistant.message
{
"type": "assistant.message",
"data": {
"messageId": "<uuid>",
"content": "I'll review the staged changes in those three files.",
"toolRequests": [
{
"toolCallId": "tooluse_...",
"name": "report_intent",
"arguments": { "intent": "Reviewing staged changes" },
"type": "function"
},
{
"toolCallId": "tooluse_...",
"name": "powershell",
"arguments": {
"command": "git --no-pager diff --cached --stat",
"description": "Show staged diff"
},
"type": "function"
}
],
"interactionId": "9352571e-a0b9-4774-8ecb-40bc58f86e94",
"reasoningOpaque": "<base64-encrypted-reasoning>",
"reasoningText": "The user wants me to review staged git changes..."
},
"id": "<event-uuid>",
"timestamp": "2026-03-02T15:10:50.235Z",
"parentId": "<parent-event-uuid>"
}
Extraction strategy:
- Extract
data.content— the assistant's visible text response data.toolRequests— skim tool names and description arguments for file/command patterns; ignorereport_intentcalls- Skip
data.reasoningOpaqueentirely — encrypted/encoded internal reasoning - Skip
data.reasoningTextentirely — decrypted reasoning; internal only, never user-visible
Event: assistant.turn_start
{
"type": "assistant.turn_start",
"data": { "turnId": "0", "interactionId": "..." },
"id": "...",
"timestamp": "..."
}
Marks the start of an assistant turn. Useful for turn boundary detection; no content to extract.
Event: tool.execution_start
{
"type": "tool.execution_start",
"data": {
"toolCallId": "tooluse_...",
"toolName": "powershell",
"arguments": { "command": "dotnet build ...", "description": "Build project" }
},
"id": "...",
"timestamp": "..."
}
Reveals what tools (file reads, commands, searches) were invoked. File-related tools (view, edit, create) with their paths are worth noting for the session_files equivalent when reading events directly.
Event: tool.execution_end
Contains the raw tool output. Usually noise — skip unless diagnosing errors.
checkpoints/<uuid>.json
Mid-session progress summaries, written automatically as the session progresses:
{
"title": "Implementing auth module",
"overview": "Working on JWT authentication for the API...",
"history": "1. Analyzed existing auth code\n2. Created IAuthService...",
"work_done": "- Created IAuthService interface\n- Implemented JwtAuthService",
"technical_details": "Uses RS256 signing. Token expiry configurable via settings...",
"important_files": "- src/Auth/IAuthService.cs\n- src/Auth/JwtAuthService.cs",
"next_steps": "- Wire up to DI container\n- Add refresh token support"
}
This is the highest-value structured content in the per-session directory — equivalent to Claude's memory files.
index.md
Session-end summary written as a markdown file. Typically 1–3 paragraphs summarizing what was accomplished. Content varies by session length and complexity. Read this before opening events.jsonl to decide if the session is worth deep-processing.
Global Session Store (session-store.db)
SQLite database at ~/.copilot/session-store.db. The canonical cross-session record.
Schema
sessions
| Column | Type | Notes |
|---|---|---|
id |
TEXT | Session UUID (PK) |
cwd |
TEXT | Working directory |
repository |
TEXT | owner/repo format when available |
branch |
TEXT | Git branch name |
summary |
TEXT | One-paragraph session summary |
created_at |
TEXT | ISO 8601 timestamp |
updated_at |
TEXT | ISO 8601 timestamp — use for delta checks |
host_type |
TEXT | "vscode", "cli", or similar |
turns
| Column | Type | Notes |
|---|---|---|
id |
INTEGER | PK |
session_id |
TEXT | FK → sessions.id |
turn_index |
INTEGER | 0-based turn sequence |
user_message |
TEXT | Raw user message |
assistant_response |
TEXT | Assistant's text response |
timestamp |
TEXT | ISO 8601 timestamp |
Note: user_message here is the pre-transformation content — use this, not transformedContent from events.jsonl.
checkpoints
| Column | Type | Notes |
|---|---|---|
id |
INTEGER | PK |
session_id |
TEXT | FK → sessions.id |
checkpoint_number |
INTEGER | 1-based |
title |
TEXT | Short title |
overview |
TEXT | High-level summary |
history |
TEXT | Step-by-step of what happened |
work_done |
TEXT | Completed items |
technical_details |
TEXT | Implementation specifics |
important_files |
TEXT | Key files touched |
next_steps |
TEXT | Open threads |
created_at |
TEXT | ISO 8601 timestamp |
session_files
| Column | Type | Notes |
|---|---|---|
session_id |
TEXT | FK → sessions.id |
file_path |
TEXT | Absolute path to the file |
tool_name |
TEXT | "edit", "create", "view", etc. |
turn_index |
INTEGER | Which turn touched the file |
first_seen_at |
TEXT | ISO 8601 timestamp |
⚠️ No
idcolumn — useCOUNT(DISTINCT sf.file_path)notCOUNT(DISTINCT sf.id).
Aggregate by file_path across sessions to identify architecturally important files.
session_refs
| Column | Type | Notes |
|---|---|---|
id |
INTEGER | PK |
session_id |
TEXT | FK → sessions.id |
ref_type |
TEXT | "commit", "pr", "issue" |
ref_value |
TEXT | Commit SHA, PR number, issue number |
turn_index |
INTEGER | Which turn referenced it |
created_at |
TEXT | ISO 8601 timestamp |
search_index (FTS5)
Full-text search index. Use for keyword discovery when surveying a large history:
SELECT content, session_id, source_type
FROM search_index
WHERE search_index MATCH 'auth OR authentication OR login'
LIMIT 20;
source_type values: "turn", "checkpoint_overview", "checkpoint_history", "checkpoint_work_done", "checkpoint_technical", "checkpoint_files", "checkpoint_next_steps", "workspace_artifact".
VS Code Workspace Storage
Location
The workspaceStorage directory is platform-specific:
| Platform | Default path |
|---|---|
| Windows | %APPDATA%\Code\User\workspaceStorage\ |
| macOS | ~/Library/Application Support/Code/User/workspaceStorage/ |
| Linux | ~/.config/Code/User/workspaceStorage/ |
Each <hash>/ subdirectory corresponds to a specific workspace (VS Code folder). The hash is derived from the workspace path — there is no human-readable mapping, so enumerate all <hash>/GitHub.copilot-chat/ directories and use the transcripts/ JSONL files' session.start events to identify which project each belongs to.
Transcript JSONL (transcripts/<uuid>.jsonl)
Identical format to events.jsonl from Source 1. Parse using the same event type handlers. The session.start event's data.context.cwd tells you which project this belongs to.
Memory Artifacts (memory-tool/memories/<base64-session-id>/)
Directory name is the session UUID encoded as base64. Files inside are markdown documents explicitly saved by the user or system during the session — typically plan.md containing the session plan.
Decode the directory name to link it to a session:
import base64
# Pad to multiple of 4 before decoding
session_id = base64.b64decode(dir_name + '==').decode('utf-8')
Processing Order
For maximum efficiency and signal-to-noise:
session-store.dbcheckpoints — Fastest, highest signal. Query all at once.session-store.dbsessions.summary — One-paragraph synopsis per session.- Per-session
checkpoints/*.json+index.md— For sessions not yet insession-store.dbor for additional detail. - Memory artifacts (
memory-tool/memories/) — User-authored, high quality. session-store.dbturns — Full conversation, process selectively by topic.events.jsonl/ transcript JSONL — Only ifsession-store.dbis absent or incomplete.session_files/session_refs— For file pattern and git linkage metadata.