add claude-obsidian

2026-05-28 10:57:16 +09:00
parent 1b07531a45
commit 72dad72703
205 changed files with 41703 additions and 80 deletions
@@ -0,0 +1,69 @@
+---
+type: concept
+title: "Compounding Knowledge"
+complexity: basic
+domain: knowledge-management
+aliases:
+  - "Knowledge Compounding"
+  - "Persistent Synthesis"
+created: 2026-04-07
+updated: 2026-04-07
+tags:
+  - concept
+  - knowledge-management
+status: mature
+related:
+  - "[[LLM Wiki Pattern]]"
+  - "[[Hot Cache]]"
+  - "[[Andrej Karpathy]]"
+  - "[[concepts/_index]]"
+sources:
+---
+
+# Compounding Knowledge
+
+The central insight behind the [[LLM Wiki Pattern]]: knowledge in a wiki compounds like interest in a bank. Every source added, every question answered, every analysis filed makes the wiki more valuable — not just by adding pages, but by enriching the connections between existing pages.
+
+---
+
+## Why Normal AI Chats Don't Compound
+
+In a standard chat, knowledge is ephemeral. Each session starts fresh. Even if you upload the same documents repeatedly, the LLM re-derives the same insights from scratch. Nothing accumulates.
+
+The same is true of most RAG systems: they index raw documents and retrieve chunks at query time. The retrieval gets the right fragments, but no synthesis is built up. Nothing is compiled. Ask the same complex question twice and you get the same assembly process twice.
+
+---
+
+## How Wiki Knowledge Compounds
+
+When a new source arrives, the LLM doesn't just index it. It integrates it:
+- Updates entity pages with new information
+- Flags contradictions with existing claims
+- Strengthens or challenges the evolving synthesis
+- Adds cross-references from the new source to existing pages and back
+
+The cross-references are already there next time. The contradictions have already been flagged. The synthesis already reflects everything that was read.
+
+**The wiki is pre-compiled knowledge.** RAG re-compiles on every query.
+
+---
+
+## The Maintenance Problem
+
+Wikis maintained by humans decay. The maintenance burden grows faster than the value — updating cross-references, keeping summaries current, noting when new data contradicts old claims. Humans abandon wikis because no one wants to do the bookkeeping.
+
+LLMs don't get bored. They don't forget to update a cross-reference. The cost of maintenance is near zero. This is the practical reason the wiki pattern works: the entity that's best at the tedious maintenance work is the same entity that reads and writes the wiki.
+
+---
+
+## In Practice
+
+One X user turned 383 scattered files and over 100 meeting transcripts into a compact wiki and dropped token usage by 95% when querying with Claude. The drop came from two sources: better navigation (index + hot cache vs. full document search) and pre-compiled synthesis (no re-deriving the same insights from scratch).
+
+---
+
+## Connections
+
+See [[LLM Wiki Pattern]] for the full architecture.
+See [[Hot Cache]] for the session context mechanism.
+See [[Andrej Karpathy]] for the origin of this framing.
@@ -0,0 +1,295 @@
+---
+type: concept
+title: "DragonScale Memory"
+address: c-000001
+complexity: advanced
+domain: knowledge-management
+aliases:
+  - "DragonScale"
+  - "DragonScale Architecture"
+  - "Fractal Memory"
+created: 2026-04-23
+updated: 2026-04-24
+tags:
+  - concept
+  - knowledge-management
+  - memory
+  - architecture
+  - fractal
+status: proposed
+related:
+  - "[[LLM Wiki Pattern]]"
+  - "[[Compounding Knowledge]]"
+  - "[[Hot Cache]]"
+  - "[[concepts/_index]]"
+sources:
+---
+
+# DragonScale Memory
+
+A memory-layer design for LLM wiki vaults, inspired by the Heighway dragon curve. Four mechanisms (fold operator, deterministic page addresses, semantic tiling, boundary-first autoresearch) give an LLM-maintained wiki a principled way to grow, compact, and stay coherent. The dragon curve is a design-justification device, not a reasoning architecture.
+
+> **Status: v0.4 2026-04-24.** All four mechanisms shipped as opt-in features. Phase 0 (spec) + Phase 1 (wiki-fold skill, dry-run verified) + Phase 2 (address MVP) + Phase 3 (semantic tiling) + Phase 3.5/3.6 (hardening) + Phase 4 (boundary-first autoresearch). See Review History for the progression.
+
+---
+
+## Scope
+
+DragonScale is a **memory architecture**: it governs how a wiki grows, compacts, addresses its pages, and checks for duplicates. It is **not a search, planning, or reasoning algorithm.** Agent reasoning uses existing patterns (Tree of Thoughts with BFS/DFS/beam search; Yao et al. 2023).
+
+**Honest disclaimer**: memory-layer choices are never neutral with respect to reasoning. What the vault surfaces, and in what order, shapes what the model sees. Long-context performance is position-sensitive (Liu et al. 2023, *Lost in the Middle*), and MemGPT's premise is that paging policy affects task success (Packer et al. 2023). One of the four mechanisms below (boundary-first autoresearch) explicitly crosses into agenda control; it is included deliberately and marked as such.
+
+---
+
+## The Core Analogy
+
+Four dragon-curve properties map onto memory-system patterns already validated in adjacent fields. The word is *analogue*, not *identity*.
+
+| Dragon curve property | Memory analogue | Strength of analogy |
+|---|---|---|
+| Paper-folding recursion: `D_{n+1} = D_n · R · swap(reverse(D_n))` | Hierarchical rollup / materialized summary with exponential fanout | Loose. Shares exponential batch structure, not compaction semantics. |
+| Turn derivable from bits of `n` (regular paperfolding sequence, OEIS A014577) | Deterministic page addresses as organizational convention (MVP is a creation-order counter, not a true content hash) | Loose. Deterministic addressing is useful independent of the dragon. |
+| Tiling / no self-intersection | Canonical-home coverage: one concept, one page | Medium. Dedup lint enforces this mechanically. |
+| Boundary dim ≈ 1.523627 vs interior dim 2 | Agent attention weighted toward frontier pages | Aesthetic. The fractal dimension number does no load-bearing work. |
+
+The curve is useful for deciding *which knobs to tighten and why*, not as a math proof that any given mechanism is optimal.
+
+---
+
+## Mechanism 1 — Fold Operator
+
+After a batch of ingests, run a fold: produce a meta-page summarizing the batch, link children back, update the index. Folds stack: after enough level-`k` folds accumulate, a level-`k+1` fold produces a super-summary.
+
+This is a **hierarchical rollup**, loosely similar to LSM-tree compaction but with important differences.
+
+**What it shares with LSM compaction:**
+- Exponential batch fanout across levels (like LevelDB's fixed level-size ratio, typically 10× per level in leveled mode)
+- Periodic consolidation rather than per-write work
+
+**What it does NOT inherit from LSM:**
+- No sorted-key semantics (pages have semantic, not key-ordered, identity)
+- No SSTable/memtable distinction, no tombstones, no Bloom filters
+- No write-amplification arithmetic; no read-path acceleration
+- **Folds are additive**: children remain in place. LSM compaction rewrites and deletes. A DragonScale fold is closer to a materialized view than a compaction.
+
+**Trigger options:**
+- `2^k` entry count (k=4 ⇒ every 16 log entries). Simple to implement; straightforward level math; ignores page size and novelty.
+- **Adaptive trigger (preferred for production)**: token budget (e.g., fold when unfolded batch exceeds N tokens), novelty score (average embedding distance from existing summaries), or staleness age (last fold > T days). Phase 1 will implement entry-count for MVP; adaptive triggers are a follow-up.
+
+**Invariants:**
+- Idempotent on the same range (re-running is a no-op).
+- Reversible (children stay; a fold is additive).
+- Level-bounded: with entry-count trigger `2^k`, fold depth is at most `⌈log₂(N)⌉` above leaf pages. Derived, not empirical.
+
+---
+
+## Mechanism 2 — Deterministic Page Addresses
+
+Every new page gets a stable `address` field in frontmatter. The Phase 2 MVP uses a simple creation-order counter:
+
+```yaml
+address: c-000042
+```
+
+Format: `c-<6-digit-counter>`. `c-` means "creation-order counter." Zero-padded.
+
+**Future extension** (documented, not shipped in Phase 2):
+- Fold-relative path: `f1.2/c-000042` once folds exist, where `f1.2` encodes the fold-tree lineage.
+- Content hash suffix: `c-000042:h7f3c2` once the hash-rotation policy is decided.
+
+**What Phase 2 MVP gives:**
+- Uniqueness: counter is monotonically increasing; deleted pages' addresses are retired, never reused.
+- Stability: never changes across content edits.
+- Determinism: derivable from the counter state at `.vault-meta/address-counter.txt`.
+- Ordering: preserves creation sequence.
+
+**What this does NOT give (renamed "content-addressable paths" was misleading in v0.1):**
+- **No content-addressability in the MVP.** The Phase 2 address is a sequence counter, not a content hash. Renaming this mechanism from "content-addressable paths" to "deterministic page addresses" is more honest about what actually ships.
+- **No prompt cache benefit** (already corrected in v0.1 → v0.2). Per Anthropic docs, cache hits require byte-identical prefixes; an address field in frontmatter only helps if the frontmatter itself is inside a cached block AND stays byte-identical. Stable prefixes, not addresses, drive cache hits.
+
+**Phase 2 exclusions** (all deferred):
+- Backfill of legacy pre-Phase-2 pages (will use `l-` prefix with its own counter).
+- Fold-ancestry bit prefix (requires committed folds from a future fold-of-folds skill).
+- Content hash suffix (rotation policy unresolved; see limitations).
+
+**Implementation** (Phase 2, shipped):
+- `scripts/allocate-address.sh`: flock-guarded atomic allocator. All counter reads/writes go through this script; direct Write/Edit on `.vault-meta/address-counter.txt` is prohibited (would fire PostToolUse hook).
+- `skills/wiki-ingest/SKILL.md` → Address Assignment section: opt-in feature detection; delegates allocation to the helper; records path-to-address mapping in `.raw/.manifest.json` `address_map` for re-ingest stability.
+- `skills/wiki-lint/SKILL.md` → Address Validation section: format check, uniqueness check, counter-drift check, address-map consistency check.
+
+**Lint severity model** (matches `skills/wiki-lint/SKILL.md` Address Validation behavior):
+- Post-rollout pages (frontmatter `created:` >= 2026-04-23, or any page newly created after DragonScale adoption) that lack an address are **errors**. This is the silent-regression guard.
+- Legacy pages (`created:` < 2026-04-23) without addresses are **informational**. The optional `.vault-meta/legacy-pages.txt` manifest can grandfather pages whose `created:` metadata is wrong or missing.
+- Meta pages (`_index.md`, `index.md`, `log.md`, `hot.md`, etc.) and fold pages are excluded entirely.
+
+---
+
+## Mechanism 3 — Semantic Tiling Lint
+
+The tiling property says the same concept should live in one canonical page. Enforce it with an embedding-based dedup check in `wiki-lint`.
+
+**Procedure (calibrated, not a guess):**
+1. Compute embeddings for every page. Default model: local `nomic-embed-text` via ollama on `http://127.0.0.1:11434`. Cost: local hardware time only (no API fees). The script supports a remote override under `--allow-remote-ollama`; remote endpoints may incur provider API fees.
+2. Compute pairwise cosine similarities for all page pairs.
+3. **Calibration** (one-time, before first use): label 50-100 in-vault page pairs as duplicate/near/distinct; find the thresholds that optimize target precision for each band.
+4. **Default bands** (used before calibration, then refined):
+   - `≥ 0.90` — near-duplicate, lint error
+   - `0.80 – 0.90` — review bucket, lint warning
+   - `< 0.80` — distinct, no flag
+5. Never auto-merge. Output a review list.
+
+**Why not a fixed 0.85?** v0.1 used 0.85 with no justification. Published thresholds in the embeddings literature span a wide range (Sentence Transformers' `community_detection` defaults to 0.75; Quora-duplicate calibrations land around 0.77–0.83; sparse-model defaults differ again). Thresholds are model-, corpus-, and objective-dependent, so calibration is required.
+
+---
+
+## Mechanism 4 — Boundary-First Autoresearch
+
+> **Status: shipped (Phase 4, opt-in)** as of 2026-04-24. Implementation: `scripts/boundary-score.py`. Integration: `skills/autoresearch/SKILL.md` Topic Selection section B. Tests: `tests/test_boundary_score.py`.
+
+Boundary pages (high out-degree relative to in-degree, recency-weighted) are the vault's frontier. `/autoresearch` invoked without a topic reads the top-5 boundary pages and offers them as research candidates; the user selects one (or types a free-text topic, or declines all and falls back to the original ask-user mode).
+
+**Formula (exact)**:
+
+```
+out_degree(p) = count of distinct filename-stem wikilinks in body of p that resolve to scoreable pages
+in_degree(p)  = count of distinct scoreable pages whose body contains a wikilink to p
+recency_weight(p) = exp(-days_since_updated / 30)      # no floor; old pages approach 0
+boundary_score(p) = (out_degree - in_degree) * recency_weight
+```
+
+**Link resolution**: filename-stem only. `[[Foo]]` resolves to `Foo.md` anywhere in the vault. Aliases declared via frontmatter `aliases:` are NOT parsed. Folder-qualified links (e.g. `[[notes/Foo]]`) are resolved by stem alone. This matches Obsidian's default behavior for unique filenames but does not implement full alias resolution.
+
+**Scoreable** = any page NOT excluded by any of:
+- frontmatter `type: meta` or `type: fold`
+- filename in `{_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, Wiki Map.md, getting-started.md}`
+- path prefix in `wiki/folds/` or `wiki/meta/`
+- symlinks or paths whose resolved target escapes the vault root (rejected at scan time)
+
+**Code-block filtering**: triple-backtick AND triple-tilde fenced code blocks are skipped, with CommonMark-like length tracking so a longer opening fence is not closed by a shorter inner fence. Indented code blocks (4+ spaces) are NOT filtered because Obsidian bullet lists commonly use 4-space indentation and contain real wikilinks. See `scripts/boundary-score.py:RECENCY_HALFLIFE_DAYS` for the sole tunable constant.
+
+**Honest labeling**: this mechanism is **agenda control**, not pure memory. It shapes what the agent researches next. It is included in DragonScale because it is a direct consequence of the dragon-curve boundary analogy, and because it pairs naturally with folds (freshly folded pages have low out-degree; frontier pages are pre-fold). But the "memory only, not reasoning" framing does not cover it. Users who want a strict memory-layer subset should omit this mechanism (simply do not invoke `/autoresearch` without a topic, or do not set up `scripts/boundary-score.py`).
+
+**What is NOT included**:
+- No auto-triggering. `/autoresearch` is still user-invoked.
+- No persistent boundary-score cache. Scoring is O(N * avg_links) and runs on every invocation from fresh wiki/ state.
+- No integration with folds or addresses. Pure graph analysis on the wikilink graph.
+- No automatic topic selection without user confirmation. The helper presents choices; the user picks.
+
+---
+
+## Operational Policies (required before implementation)
+
+Adversarial review flagged these gaps in v0.1. Each must be decided before the corresponding phase ships.
+
+| Policy | Phase 0 position | Decision point |
+|---|---|---|
+| **Retention / GC** | No automatic deletion. Pages are permanent. | Revisit if vault exceeds ~5000 pages. |
+| **Tombstones** | None. Deleted pages are removed via git revert. | Revisit if delete events become common. |
+| **Versioning** | Relied on git history, not in-vault versioning. | Address-hash rotation policy doubles as a coarse version signal. |
+| **Conflict resolution for contradictory folds** | Meta-page must quote both sources with explicit "conflict" callout. No automatic resolution. | Phase 1 spec required. |
+| **Concurrency / atomicity** | Single-writer assumption (one Claude session at a time). PostToolUse auto-commit serializes. | Multi-writer case deferred. |
+| **Provenance for meta-pages** | Every fold page must include frontmatter listing children and fold level. | Phase 1 must enforce. |
+| **Access control** | Out of scope. This is a single-user vault. | Revisit only if shared. |
+
+---
+
+## Mapping to Claude-Obsidian
+
+| Mechanism | Status | New | Extends |
+|---|---|---|---|
+| Fold operator | shipped (Phase 1, dry-run verified) | `skills/wiki-fold/` | reads `log.md`, writes `wiki/folds/`, updates `index.md` on commit |
+| Address anchors | shipped (Phase 2, opt-in) | `scripts/allocate-address.sh`, new frontmatter field | `wiki-ingest` (assignment), `wiki-lint` (validation) |
+| Semantic tiling | shipped (Phase 2/3, opt-in) | `scripts/tiling-check.py`, `.vault-meta/tiling-thresholds.json` | `wiki-lint` with banded thresholds, calibration procedure documented |
+| Boundary-first | shipped (Phase 4, opt-in) | `scripts/boundary-score.py`, `tests/test_boundary_score.py` | `skills/autoresearch/SKILL.md` Topic Selection section B; `commands/autoresearch.md` no-topic path |
+
+The existing hot → index → domain → page hierarchy already implements self-similarity across scales. That's the one dragon-curve property this vault had before DragonScale.
+
+---
+
+## Why This Over Alternatives
+
+| Pattern | What it gives | What DragonScale adds |
+|---|---|---|
+| MemGPT virtual context (two-tier paging) | Main context ↔ external context swap | More than two levels; explicit fold triggers; dedup lint |
+| Pure LSM compaction | Exponential write-path throughput | Semantic-layer mechanisms (tiling, boundary); additive rollups over destructive merges |
+| Ad-hoc `/save` | Human-triggered filing | Rule-based fold cadence |
+| Vector-only RAG | Retrieval | Canonical-home structure; lineage addresses |
+
+DragonScale composes patterns validated in adjacent systems: LSM *batching* (databases), MemGPT *paging* (agents), Anthropic *cache ordering* (prompt engineering), and embedding *dedup* (knowledge graphs).
+
+---
+
+## Known Limitations (v0.3)
+
+- **Unvalidated at scale.** All four mechanisms are theoretical; none tested on a multi-thousand-page vault.
+- **Fold cadence is a knob, not a theorem.** `k=4` is a starting guess. Adaptive triggers are likely better.
+- **Address stability is unsolved.** Hash rotation on edits is a known issue; deferred.
+- **Boundary-first crosses scope.** Included with a warning, not quietly.
+- **Calibration load.** Tiling requires a one-time labeling pass; without it, only defaults apply.
+
+---
+
+## Primary Sources
+
+Verified against primary sources on 2026-04-23. **Scope of tagging**: the specific numeric values, formulas, and named patterns below are tagged **[sourced]** when directly citable, **[derived]** when derivable from sourced material, or **[conjecture]** when based on reasoning without a specific source. **Not tagged** (and readers should treat as interpretive synthesis): framing sentences in the body such as "composes patterns validated," "self-similarity already exists," and the design rationale tying the four mechanisms together. These are editorial, not source-backed.
+
+**Dragon curve math [sourced]**
+- Boundary dimension `2·log₂(λ)` where `λ³ − λ² − 2 = 0`, giving 1.523627086: [Dragon curve, Wikipedia](https://en.wikipedia.org/wiki/Dragon_curve)
+- Paper-folding construction and OEIS A014577: [Regular paperfolding sequence, Wikipedia](https://en.wikipedia.org/wiki/Regular_paperfolding_sequence); [OEIS A014577](https://oeis.org/A014577)
+- Tiling and rep-tiles: [Wolfram Demonstrations: Tiling Dragons and Rep-tiles of Order Two](https://demonstrations.wolfram.com/TilingDragonsAndRepTilesOfOrderTwo/)
+
+**LSM trees [sourced]**
+- Level size ratios and compaction semantics: [RocksDB Compaction wiki](https://github.com/facebook/rocksdb/wiki/Compaction), [RocksDB Tuning Guide](https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide), [How to Grow an LSM-tree? (2025)](https://arxiv.org/abs/2504.17178)
+- LevelDB 10× level ratio: referenced in the arXiv paper above. Treat as *typical*, not required.
+
+**LLM memory architectures [sourced]**
+- OS-inspired paging: [MemGPT: Towards LLMs as Operating Systems (Packer et al. 2023)](https://arxiv.org/abs/2310.08560)
+- Position sensitivity: [Lost in the Middle (Liu et al. 2023)](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00638/119630/Lost-in-the-Middle-How-Language-Models-Use-Long)
+- Note-based agentic memory: [A-Mem (2025)](https://arxiv.org/abs/2502.12110)
+
+**Prompt caching [sourced]**
+- Byte-identical prefix requirement, breakpoint mechanics, TTL options: [Anthropic Prompt Caching docs](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
+
+**Embedding thresholds [sourced]**
+- Sentence Transformers defaults and calibration examples: [Sentence Transformers util](https://sbert.net/docs/package_reference/util.html), [SBERT evaluation docs](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html)
+
+**Reasoning search (out of scope, cited only to justify the scope boundary) [sourced]**
+- [Tree of Thoughts (Yao et al. 2023)](https://arxiv.org/abs/2305.10601)
+
+**Items marked [conjecture] in this doc:**
+- `k=4`/`k=5` starting value for fold cadence (needs empirical tuning)
+- `~30s` full-vault embedding-pass time (needs measurement)
+- `boundary_score` formula exact weighting (a plausible starting form; not validated against retrieval metrics)
+
+**Items marked [derived]:**
+- `⌈log₂(N)⌉` fold-depth bound (trivially derivable from the entry-count trigger)
+- Default tiling bands `{≥0.90, 0.80-0.90, <0.80}` before calibration (interpolated from cited ranges in Sentence Transformers examples; not optimal by construction)
+
+---
+
+## Review History
+
+**v0.1 (2026-04-23, initial draft)** — written after a verification pass against Wikipedia, arXiv, and Anthropic docs. Four mechanisms proposed.
+
+**v0.4 (2026-04-24, Phase 4 shipped)** — Mechanism 4 (boundary-first autoresearch) implemented as `scripts/boundary-score.py` with `tests/test_boundary_score.py` covering parsing, recency weight, wikilink extraction (with fence-length + tilde + indented-block tests), graph construction (self-loop/unresolved/meta-target exclusion), symlink rejection, and CLI surface (`--top`, `--page`, `--json`). Integrated into `skills/autoresearch/SKILL.md` as an opt-in Topic Selection mode with explicit helper-failure fallback. Spec's "NOT IMPLEMENTED" marker removed; exact scoring formula (no recency floor), filename-stem-only resolution disclosure, scope, and "what is NOT included" section added. Phase 3.6 pre-Phase-4 hardening shipped concurrently (5 fixes: `--report` path confinement, rollout baseline, AGENTS.md consistency, wiki-ingest .raw contradiction, install-guide version).
+
+**v0.3 (2026-04-23, Phase 2 alignment)** — Mechanism 2 rewritten to match the actual Phase 2 MVP shipped in `wiki-ingest` and `wiki-lint`. Renamed from "Content-Addressable Paths" to "Deterministic Page Addresses" (the MVP is a creation-order counter, not a content hash). Documented the extension path for fold-ancestry bits and content-hash suffix, both explicitly deferred.
+
+**v0.2 (2026-04-23, post-adversarial review)** — after `codex exec` adversarial review. All 7 critiques accepted:
+
+1. *LSM "structurally identical"* → weakened to "loosely analogous to hierarchical rollup"; non-inherited properties listed explicitly.
+2. *Prompt cache address benefit* → removed strong claim; narrowed to organizational convention.
+3. *0.85 threshold* → replaced with calibration procedure and banded defaults.
+4. *2^k cadence* → justified as implementation convenience; adaptive trigger flagged as preferred for production.
+5. *Scope boundary contradiction* → acknowledged; boundary-first explicitly labeled as agenda control.
+6. *Missing production mechanisms* → added Operational Policies section (retention, versioning, conflict resolution, concurrency, provenance).
+7. *Unverified claims* → tagged specific numeric values, formulas, and named patterns as [sourced], [derived], or [conjecture]. Editorial synthesis in the body explicitly flagged as not tagged (see scope note under Primary Sources).
+
+---
+
+## Connections
+
+See [[LLM Wiki Pattern]] for the broader pattern this extends.
+See [[Compounding Knowledge]] for why persistent state is the precondition for DragonScale.
+See [[Hot Cache]] for the existing 500-word session context, which is a level-0 manual fold.
+See [[Andrej Karpathy]] for the intellectual lineage.
@@ -0,0 +1,95 @@
+---
+type: concept
+title: "Hot Cache"
+complexity: basic
+domain: knowledge-management
+aliases:
+  - "hot.md"
+  - "Session Cache"
+  - "Context Cache"
+created: 2026-04-07
+updated: 2026-04-07
+tags:
+  - concept
+  - knowledge-management
+  - context
+status: mature
+related:
+  - "[[LLM Wiki Pattern]]"
+  - "[[Compounding Knowledge]]"
+  - "[[index]]"
+  - "[[hot]]"
+  - "[[concepts/_index]]"
+sources:
+---
+
+# Hot Cache
+
+A ~500-word summary of the most recent context in the wiki vault. Stored in `wiki/hot.md`. Updated at the end of every session and after every significant ingest or query.
+
+The hot cache exists to answer one question: "where did we leave off?" A new session reads `hot.md` first. If the answer is there, it skips crawling the rest of the wiki.
+
+---
+
+## What It Stores
+
+- What was most recently ingested or discussed
+- Key recent facts and takeaways
+- Pages recently created or updated
+- Active threads and open questions
+- What the user is currently focused on
+
+---
+
+## Format
+
+```markdown
+---
+type: meta
+title: "Hot Cache"
+updated: YYYY-MM-DDTHH:MM:SS
+---
+
+# Recent Context
+
+## Last Updated
+YYYY-MM-DD — [what happened]
+
+## Key Recent Facts
+- [Most important recent takeaway]
+- [Second]
+
+## Recent Changes
+- Created: new wiki pages from this ingest
+- Updated: existing pages with new connections
+- Flagged: contradictions between sources where found
+
+## Active Threads
+- User is researching [topic]
+- Open question: [thing being investigated]
+```
+
+---
+
+## Rules
+
+- Keep it under 500 words. It is a cache, not a journal.
+- Overwrite it completely each time. Not append-only.
+- One file. Not split by date.
+- Updated after every ingest, significant query, and at the end of each session.
+
+---
+
+## Why It Matters
+
+Without the hot cache, every session starts cold: read the index (1000 tokens), read several domain sub-indexes, read several individual pages. With the hot cache, the first 500 tokens often have everything needed.
+
+In practice, adding `hot.md` to an executive assistant vault dramatically reduces the token cost of session startup compared to crawling multiple wiki pages.
+
+The hot cache is especially valuable in cross-project setups: another Claude Code project can point at this vault and read `hot.md` first to get recent context at minimal token cost.
+
+---
+
+## Connections
+
+The hot cache is part of the [[LLM Wiki Pattern]] token discipline strategy. See [[index]] for how the broader navigation works.
@@ -0,0 +1,97 @@
+---
+type: concept
+title: "LLM Wiki Pattern"
+complexity: intermediate
+domain: knowledge-management
+aliases:
+  - "LLM Knowledge Base"
+  - "Karpathy Wiki"
+  - "Persistent Wiki"
+created: 2026-04-07
+updated: 2026-04-07
+tags:
+  - concept
+  - knowledge-management
+  - llm
+  - obsidian
+status: mature
+related:
+  - "[[Hot Cache]]"
+  - "[[Compounding Knowledge]]"
+  - "[[Andrej Karpathy]]"
+  - "[[index]]"
+  - "[[concepts/_index]]"
+sources:
+---
+
+# LLM Wiki Pattern
+
+A pattern for building persistent, compounding knowledge bases using LLMs. Originated by [[Andrej Karpathy]]. The key insight: instead of re-deriving knowledge from raw documents on every query (RAG), the LLM incrementally builds and maintains a structured wiki that gets richer with every source added.
+
+---
+
+## The Core Idea
+
+Most AI knowledge tools work like RAG: index raw documents, retrieve chunks at query time, generate an answer. Nothing accumulates. Ask a question that needs five documents and the LLM reassembles fragments every time.
+
+The wiki pattern is different. When a new source arrives, the LLM reads it, extracts what matters, and integrates it into the wiki: updating entity pages, noting contradictions, strengthening the synthesis. The cross-references are already there. The knowledge is compiled once and kept current.
+
+**The wiki is a persistent, compounding artifact.** The human curates sources and asks questions. The LLM writes and maintains everything.
+
+---
+
+## Three Layers
+
+```
+.raw/       Layer 1 — immutable source documents
+wiki/       Layer 2 — LLM-generated knowledge base
+CLAUDE.md   Layer 3 — schema that tells the LLM how to maintain it
+```
+
+The LLM owns Layer 2 entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. The human reads; the LLM writes.
+
+---
+
+## Operations
+
+**Ingest** — drop a source into `.raw/`, tell the LLM to process it. The LLM reads the source, discusses key takeaways, writes a summary page, updates entity and concept pages, and logs the operation. One source typically touches 8-15 wiki pages.
+
+**Query** — ask a question. The LLM reads the index to find relevant pages, synthesizes an answer with citations. Good answers get filed back into the wiki.
+
+**Lint** — periodic health check. Find orphan pages, dead links, stale claims, missing cross-references.
+
+---
+
+## Index and Log
+
+**index.md** — content-oriented. A catalog of all pages with one-line summaries, organized by category. The LLM reads this first on every query to find relevant pages.
+
+**log.md** — chronological. Append-only record of every ingest, query, and lint pass. Parseable: `grep "^## \[" log.md | head -10`
+
+---
+
+## Why It Works
+
+The tedious part of maintaining a knowledge base is bookkeeping: updating cross-references, noting when new data contradicts old claims, keeping summaries current. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored. The wiki stays maintained because the cost of maintenance is near zero.
+
+At small scale (~100 sources, ~hundreds of pages), the index file is sufficient. No vector database, no embeddings, no infrastructure. Just markdown files.
+
+---
+
+## Comparison to RAG
+
+| Dimension | LLM Wiki | Semantic RAG |
+|-----------|----------|-------------|
+| Finding | Reads index, follows links | Similarity search over embeddings |
+| Infrastructure | Just markdown files | Embedding model + vector DB |
+| Cost | Tokens only | Ongoing compute + storage |
+| Maintenance | Run a lint | Re-embed when content changes |
+| Scale limit | Hundreds of pages | Millions of documents |
+
+---
+
+## Connections
+
+See [[Compounding Knowledge]] for why the pattern produces more value over time.
+See [[Hot Cache]] for the session context optimization.
+See [[Andrej Karpathy]] for the pattern's origin.
@@ -0,0 +1,44 @@
+---
+type: concept
+title: "Persistent Wiki Artifact"
+created: 2026-04-24
+updated: 2026-04-24
+tags:
+  - llm-wiki
+  - knowledge-management
+  - agent-memory
+status: developing
+related:
+  - "[[How does the LLM Wiki pattern work?]]"
+  - "[[LLM Wiki Pattern]]"
+  - "[[Compounding Knowledge]]"
+  - "[[Source-First Synthesis]]"
+  - "[[Query-Time Retrieval]]"
+---
+
+# Persistent Wiki Artifact
+
+A persistent wiki artifact is the maintained Markdown layer between raw sources and future questions. In Karpathy's LLM Wiki description, the agent reads source material, extracts key information, and integrates it into an interlinked wiki instead of only retrieving chunks at answer time: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+
+## Boundary Filled
+
+The selected question explains that an LLM Wiki compounds knowledge, but it does not isolate the artifact as the unit of memory. This page makes that boundary explicit: memory is stored in files that can be browsed, linked, reviewed, and revised.
+
+## Extracted Claims
+
+- The LLM Wiki pattern defines raw sources, the generated wiki, and a schema document as separate layers: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- In that pattern, the raw source collection is treated as immutable, while the wiki layer is owned and maintained by the LLM: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The pattern frames the wiki as a compounding artifact whose cross-references, contradiction flags, and synthesis persist across later questions: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- Obsidian supports Wikilinks such as `[[Three laws of motion]]`, which lets Markdown files form an internal network of notes: https://obsidian.md/help/links
+- Obsidian can automatically update internal links when a file is renamed, depending on the vault setting: https://obsidian.md/help/links
+
+## Implications for This Vault
+
+- The durable memory object is the page, not the chat turn.
+- The page needs frontmatter, stable title, wikilinks, and source URLs so later agents can inspect provenance.
+- The page should remain small enough to revise directly, because the LLM Wiki pattern depends on updating existing synthesis when new sources arrive: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+
+## Primary Sources
+
+- https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- https://obsidian.md/help/links
@@ -0,0 +1,60 @@
+---
+type: concept
+title: "Pro Hub Challenge"
+created: 2026-04-14
+updated: 2026-04-14
+tags:
+  - concept
+  - community
+  - ai-marketing-hub
+  - claude-seo
+  - open-source
+status: evergreen
+related:
+  - "[[Claude SEO]]"
+  - "[[2026-04-14-claude-seo-v190-session]]"
+  - "[[Semantic Topic Clustering]]"
+  - "[[Search Experience Optimization]]"
+---
+
+# Pro Hub Challenge
+
+A community challenge hosted in the [AI Marketing Hub Pro](https://www.skool.com/ai-marketing-hub-pro) Skool community where members build extensions for Claude SEO or Claude Blog, competing for $600 in Claude Credits.
+
+## First Challenge (v1.9.0, April 2026)
+
+**6 submissions, 5 scored Proficient or above**
+
+| Contributor | Submission | Score | Integrated? |
+|------------|------------|-------|-------------|
+| Lutfiya Miller | Semantic Cluster Engine | Winner | Yes — `seo-cluster` |
+| Florian Schmitz | SXO Skill | Proficient | Yes — `seo-sxo` |
+| Dan Colta | SEO Drift Monitor | Proficient | Yes — `seo-drift` |
+| Chris Muller | Multi-lingual Blog | Proficient | Partial — SEO parts into `seo-hreflang` |
+| Matej Marjanovic | E-commerce + Cost Config | Proficient | Yes — `seo-ecommerce` + cost guardrails |
+| Benjamin Samar | SEO Dungeon | Reviewed | No — not integrated in v1.9.0 |
+
+## Integration Pattern
+
+Community submissions go through:
+1. **Full code review** — architecture, quality, security
+2. **Security audit** — SSRF, injection, credential handling
+3. **Cherry-pick** — only SEO-relevant parts for claude-seo, blog parts stay for claude-blog
+4. **De-brand** — remove contributor-specific branding (e.g., ScienceExperts.ai)
+5. **Attribution** — `original_author` in SKILL.md frontmatter, HTML comments in agents, CONTRIBUTORS.md
+
+## Submission Guidelines (from CONTRIBUTING.md)
+
+1. SKILL.md under 500 lines, references under 200 lines
+2. All scripts must import `validate_url()` for SSRF protection
+3. Include `original_author` in SKILL.md frontmatter metadata
+4. Submit via PR or post in AI Marketing Hub community
+
+## Second Challenge (April 2026)
+
+**Keyword**: LEADS
+**Prize pool**: $600 ($400 first place, $200 second place) in Claude Credits
+**Deadline**: April 28, 2026
+**Scope**: Anything touching lead generation — Claude Code skills, n8n workflows, MCP servers, scrapers, dashboards, pipelines. If it helps someone capture, qualify, nurture, or convert leads, it counts.
+**Rules**: GitHub repo or .zip file + 1-2 minute demo video. Must be functional (not a concept). Solo or team both welcome.
+**Previous winner**: Lutfiya Miller (seo-cluster, integrated in v1.9.0)
@@ -0,0 +1,44 @@
+---
+type: concept
+title: "Query-Time Retrieval"
+created: 2026-04-24
+updated: 2026-04-24
+tags:
+  - rag
+  - retrieval
+  - llm-wiki
+status: developing
+related:
+  - "[[How does the LLM Wiki pattern work?]]"
+  - "[[Wiki vs RAG]]"
+  - "[[LLM Wiki Pattern]]"
+  - "[[Persistent Wiki Artifact]]"
+  - "[[Source-First Synthesis]]"
+---
+
+# Query-Time Retrieval
+
+Query-time retrieval is the baseline memory pattern that LLM Wiki is contrasted against: relevant material is retrieved when the user asks a question, and the answer is generated from the retrieved context.
+
+## Boundary Filled
+
+The selected question contrasts wiki accumulation with RAG, but it does not define the retrieval side precisely. This page anchors the contrast in the original RAG paper and in the LLM Wiki gist.
+
+## Extracted Claims
+
+- The RAG paper defines retrieval-augmented generation as combining parametric memory with non-parametric memory for language generation: https://arxiv.org/abs/2005.11401
+- The RAG paper describes the non-parametric memory as a dense vector index of Wikipedia accessed with a neural retriever: https://arxiv.org/abs/2005.11401
+- The paper reports that RAG models generated more specific, diverse, and factual language than a parametric-only seq2seq baseline in its evaluated generation tasks: https://arxiv.org/abs/2005.11401
+- Karpathy's LLM Wiki gist describes common document workflows as uploading files, retrieving relevant chunks at query time, and generating an answer: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- Karpathy's LLM Wiki gist states that this query-time pattern makes the model rediscover and assemble knowledge on each question instead of accumulating synthesis: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The MemGPT paper frames limited LLM context windows as a constraint for extended conversations and document analysis, then proposes virtual context management across memory tiers: https://arxiv.org/abs/2310.08560
+
+## Contrast With Wiki Memory
+
+Query-time retrieval can provide external evidence at answer time. The LLM Wiki pattern shifts part of the work earlier by compiling source material into maintained pages before later queries arrive: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+
+## Primary Sources
+
+- https://arxiv.org/abs/2005.11401
+- https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- https://arxiv.org/abs/2310.08560
@@ -0,0 +1,43 @@
+---
+type: concept
+title: "SEO Drift Monitoring"
+created: 2026-04-14
+updated: 2026-04-14
+tags:
+  - concept
+  - seo
+  - monitoring
+  - change-detection
+status: evergreen
+related:
+  - "[[Claude SEO]]"
+  - "[[Pro Hub Challenge]]"
+---
+
+# SEO Drift Monitoring
+
+"Git for SEO" — captures baselines of SEO-critical page elements, then diffs against current state to detect regressions. Contributed to [[Claude SEO]] v1.9.0 by Dan Colta.
+
+## What It Tracks
+
+17 comparison rules across 3 severity levels:
+
+| Severity | Examples |
+|----------|----------|
+| CRITICAL | Schema removed, canonical changed, noindex added, H1 removed |
+| WARNING | Title changed, CWV regression >20%, meta description changed |
+| INFO | H2 structure changed, content hash changed, image count changed |
+
+## Architecture
+
+- **SQLite persistence** at `~/.cache/claude-seo/drift/baselines.db`
+- **4 Python scripts**: `drift_baseline.py` (capture), `drift_compare.py` (diff), `drift_report.py` (HTML report), `drift_history.py` (timeline)
+- **Security-hardened**: uses only `fetch_page.py` for URL fetching (SSRF-protected). Original submission had a curl fallback that bypassed SSRF protection — completely removed during integration.
+
+## Commands
+
+```
+/seo drift baseline <url>    # Capture current state
+/seo drift compare <url>     # Compare against baseline
+/seo drift history <url>     # Show all checks over time
+```
@@ -0,0 +1,159 @@
+---
+type: concept
+title: "SVG Diagram Style Guide"
+created: 2026-04-14
+updated: 2026-04-14
+tags:
+  - design
+  - svg
+  - brand
+  - diagrams
+status: evergreen
+related:
+  - "[[index]]"
+sources:
+  - "claude-ads/assets/diagrams/ (17 SVGs, v1.5.0)"
+---
+
+# SVG Diagram Style Guide
+
+The canonical visual style for all diagrams across agricidaniel's Claude Code skill repos. Extracted from the 17 production SVGs in claude-ads. Use this as the reference when creating or updating diagrams for any skill repo.
+
+## Font
+
+```
+font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif
+```
+
+Space Grotesk is the only typeface. No fallback to serif or monospace.
+
+## Color Palette
+
+### Core (use these in every diagram)
+
+| Token | Hex | Role |
+|-------|-----|------|
+| bg | #0A0A0A | Canvas background (near-black) |
+| card | #111111 | Card/container fill |
+| card-inner | #1A1A1A | Nested element fill |
+| border | #2D2D2D | Card borders, dividers |
+| text-primary | #F5F5F0 | Headings, labels (off-white) |
+| text-secondary | #888888 | Descriptions, captions |
+| text-tertiary | #6a6a6a | De-emphasized metadata |
+| accent | #E07850 | Primary accent, arrows, highlights (warm rust-orange) |
+| accent-bright | #FF6B35 | Secondary accent, hover states (brighter orange) |
+
+### Platform/Category Colors (use for variety within a diagram)
+
+| Token | Hex | Typical use |
+|-------|-----|-------------|
+| blue | #60A5FA | Google, data, information |
+| purple | #8b5cf6 | Meta, strategy, creative |
+| cyan | #06b6d4 | LinkedIn, networking |
+| green | #4ADE80 | Success, validation, TikTok |
+| rose | #F43F5E | YouTube, alerts |
+| orange | #FF6B35 | Microsoft, secondary accent |
+| gray | #888888 | Neutral, generic platforms |
+
+### Status Colors (for pass/warn/fail indicators)
+
+| Token | Hex | Role |
+|-------|-----|------|
+| pass | #16a34a | Pass, success |
+| warn | #f59e0b | Warning, attention |
+| fail | #dc2626 | Fail, critical |
+
+## Typography Scale
+
+| Element | Size | Weight | Color | Extra |
+|---------|------|--------|-------|-------|
+| Diagram title | 16-17px | 700 | #F5F5F0 | text-anchor: middle |
+| Subtitle | 11px | 400 | #888888 | text-anchor: middle |
+| Section label | 13px | 700 | accent color | letter-spacing: 2 |
+| Card heading | 12-15px | 600-700 | #F5F5F0 | text-anchor: middle |
+| Card subtext | 9-11px | 400 | accent color | Skill/agent name |
+| Body text | 10px | 400 | #888888 | Descriptions |
+| Tiny label | 9px | 400 | #6a6a6a | Metadata, counts |
+
+## Layout Primitives
+
+### Outer Container
+```xml
+<rect width="800" height="500" fill="#0A0A0A"/>
+```
+Standard canvas is 800x500. Some diagrams use 900x250 or 900x350 depending on content.
+
+### Card
+```xml
+<rect x="40" y="20" width="720" height="120" rx="16" fill="#111111" stroke="#2D2D2D" stroke-width="1.5"/>
+```
+- Corner radius: `rx="16"` for outer containers
+- Border: `#2D2D2D`, `stroke-width="1.5"`
+
+### Colored Top Bar (card accent)
+```xml
+<rect x="40" y="20" width="720" height="4" rx="2" fill="#E07850"/>
+```
+4px height, sits at the top edge of the card. Color indicates category.
+
+### Inner Card (nested element)
+```xml
+<rect x="60" y="230" width="105" height="60" rx="6" fill="#1A1A1A" stroke="#2D2D2D" stroke-width="1"/>
+```
+- Corner radius: `rx="6"` for small inner cards, `rx="9"` for medium
+- Fill: `#1A1A1A` (slightly lighter than parent card)
+
+### Numbered Circle (for sequences)
+```xml
+<circle cx="138" cy="60" r="14" fill="#0A0A0A" stroke="#60A5FA" stroke-width="1.5"/>
+<text x="138" y="60" font-size="12" fill="#60A5FA" text-anchor="middle" font-weight="bold" dominant-baseline="central">1</text>
+```
+Circle stroke color matches the step's category color.
+
+### Arrow Connector
+```xml
+<line x1="400" y1="140" x2="400" y2="170" stroke="#E07850" stroke-width="1.5"/>
+<polygon points="394,167 400,177 406,167" fill="#E07850"/>
+```
+Always `#E07850`. Vertical for flow-down, horizontal for left-to-right pipelines.
+
+### Horizontal Divider (title underline)
+```xml
+<line x1="380" y1="36" x2="520" y2="36" stroke="#E07850" stroke-width="2.5" stroke-linecap="round"/>
+```
+Short centered line under diagram title. Always accent color.
+
+## Diagram Types (from claude-ads)
+
+| # | Name | Layout | Size |
+|---|------|--------|------|
+| 01 | Architecture | 3-layer vertical stack | 800x500 |
+| 02 | Parallel Audit | Agent grid with flow | 800x500 |
+| 04 | Platform Checks | Checklist columns | 800x500 |
+| 05 | Quality Gates | Rule cards | 800x500 |
+| 06 | How It Works | Step sequence | 900x250 |
+| 07 | Data Flow | Horizontal pipeline | 900x250 |
+| 08 | Industry Templates | Card grid | 900x350 |
+| 10 | MCP Integration | Connection diagram | 800x500 |
+| 12 | Privacy Flow | Vertical flow | 800x500 |
+| 13 | Scoring Algorithm | Formula breakdown | 800x500 |
+| 14 | Creative Pipeline | 5-step horizontal | 900x250 |
+| 15 | Platform Grid | 2-row card grid | 900x350 |
+| 16 | PDF Pipeline | Process flow | 900x250 |
+| 17 | A/B Testing | Split comparison | 800x500 |
+| 18 | PPC Calculators | Tool cards | 900x350 |
+| 19 | Audit Lifecycle | Circular flow | 800x500 |
+| 20 | Install Methods | Option cards | 900x250 |
+
+## Rules
+
+1. Always dark theme. Never white or light backgrounds.
+2. Space Grotesk only. No other fonts.
+3. #E07850 is the signature accent. Use it for arrows, highlights, and the primary visual element.
+4. Cards always have #2D2D2D borders. Never borderless cards.
+5. Colored top bars (4px) identify categories. One color per category, consistent across the diagram.
+6. Text is always left-aligned or center-aligned. Never right-aligned.
+7. No gradients, shadows, or blur filters. Flat design only.
+8. Numbered circles for sequential steps. Color matches category.
+9. Arrow connectors are always #E07850 with triangle tips.
+10. File naming: zero-padded number prefix (01-, 02-, etc.) + kebab-case description.
@@ -0,0 +1,44 @@
+---
+type: concept
+title: "Search Experience Optimization (SXO)"
+created: 2026-04-14
+updated: 2026-04-14
+tags:
+  - concept
+  - seo
+  - ux
+  - serp-analysis
+status: evergreen
+related:
+  - "[[Claude SEO]]"
+  - "[[Pro Hub Challenge]]"
+  - "[[Semantic Topic Clustering]]"
+---
+
+# Search Experience Optimization (SXO)
+
+A methodology that reads SERPs backwards to detect page-type mismatches, derives user stories from search features, and scores pages from persona perspectives. Contributed to [[Claude SEO]] v1.9.0 by Florian Schmitz.
+
+## Core Insight
+
+> "Read SERPs backwards" — instead of optimizing content FOR the SERP, analyze WHAT the SERP tells you about user expectations, then check if your page meets them.
+
+## Process
+
+1. **Page-type detection** — classify the URL as one of 8 types (Landing, Blog, Product, Hybrid, Service, Comparison, Local, Tool)
+2. **SERP pattern matching** — compare what Google shows (featured snippets, PAA, ads, related searches) against what the page provides
+3. **Mismatch detection** — if SERP says "users want comparison" but page is "product page", that's a mismatch
+4. **User story derivation** — from SERP features, derive 4-7 personas with emotional states, barriers, goals
+5. **Persona scoring** — score the page from each persona's perspective (0-100 across 4 dimensions)
+6. **Wireframe generation** — IST (current) vs SOLL (ideal) wireframes with ultra-concrete placeholders
+
+## Key Innovation
+
+Most SEO tools analyze pages in isolation. SXO uses the SERP as a proxy for user intent — the SERP IS the research that Google already did about what users want. This makes the analysis data-driven without needing user testing.
+
+## Command
+
+```
+/seo sxo <url>
+/seo sxo wireframe <url>
+```
@@ -0,0 +1,46 @@
+---
+type: concept
+title: "Semantic Topic Clustering"
+created: 2026-04-14
+updated: 2026-04-14
+tags:
+  - concept
+  - seo
+  - content-strategy
+  - clustering
+status: evergreen
+related:
+  - "[[Claude SEO]]"
+  - "[[Pro Hub Challenge]]"
+  - "[[Search Experience Optimization]]"
+---
+
+# Semantic Topic Clustering
+
+SERP-based keyword grouping that replaces paid tools ($50-200/month) with Claude's reasoning. Contributed to [[Claude SEO]] v1.9.0 by Lutfiya Miller (Pro Hub Challenge Winner).
+
+## How It Works
+
+1. **Seed keyword** provided by user
+2. **SERP fetching** — get Google results for the seed and related terms (via WebSearch or DataForSEO)
+3. **Overlap scoring** — compare top-10 results between keyword pairs:
+   - 7-10 overlapping URLs = same post (keyword cannibalization)
+   - 4-6 overlapping = same cluster (supporting content)
+   - 2-3 overlapping = interlink opportunity
+   - 0-1 overlapping = separate clusters
+4. **Hub-spoke architecture** — 1 pillar page (2500-4000 words) + 2-5 clusters + 2-4 posts each
+5. **Internal link matrix** — bidirectional linking plan with backward link injection
+6. **Visualization** — interactive cluster-map.html (SVG, dark mode, keyboard accessible)
+
+## Key Design Decisions
+
+- **No Python scripts** — clustering is prompt-driven (Claude's reasoning + WebSearch)
+- **Optional execution** — outputs content briefs when claude-blog isn't installed, full pipeline when it is
+- **Resume capability** — for long multi-post execution runs
+- **DataForSEO integration** — uses `serp_organic_live_advanced` for live SERP data when available (with cost check)
+
+## Command
+
+```
+/seo cluster <seed-keyword>
+```
@@ -0,0 +1,42 @@
+---
+type: concept
+title: "Source-First Synthesis"
+created: 2026-04-24
+updated: 2026-04-24
+tags:
+  - llm-wiki
+  - synthesis
+  - provenance
+status: developing
+related:
+  - "[[How does the LLM Wiki pattern work?]]"
+  - "[[LLM Wiki Pattern]]"
+  - "[[Compounding Knowledge]]"
+  - "[[Persistent Wiki Artifact]]"
+  - "[[Query-Time Retrieval]]"
+---
+
+# Source-First Synthesis
+
+Source-first synthesis is the LLM Wiki practice of keeping raw sources separate from the generated wiki while requiring the wiki to cite and integrate those sources. Karpathy's pattern describes raw sources as the source of truth and the generated wiki as the maintained synthesis layer: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+
+## Boundary Filled
+
+The selected question says the wiki pattern integrates sources, but it does not spell out the provenance discipline. This page records the rule: synthesis is allowed to be rewritten, but source material remains the cited anchor.
+
+## Extracted Claims
+
+- Karpathy's LLM Wiki pattern says raw sources can include articles, papers, images, and data files, and that the LLM reads them without modifying them: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The same source describes the wiki as summaries, entity pages, concept pages, comparisons, overview, and synthesis maintained by the LLM: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The ingest operation can create a source summary, update indexes, update relevant entity and concept pages, and append a log entry: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The query operation reads relevant wiki pages and synthesizes answers with citations, and useful answers can be filed back into the wiki: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- The RAG paper identifies provenance and updating world knowledge as open problems for knowledge-intensive generation systems: https://arxiv.org/abs/2005.11401
+
+## Operating Rule
+
+Source-first synthesis is stricter than unsourced summarization. A new concept page should identify the sources it used, state what was extracted from them, and avoid treating the generated page as a replacement for the source document.
+
+## Primary Sources
+
+- https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
+- https://arxiv.org/abs/2005.11401
@@ -0,0 +1,43 @@
+---
+type: meta
+title: "Concepts Index"
+updated: 2026-04-07
+tags:
+  - meta
+  - index
+  - concept
+domain: knowledge-management
+status: evergreen
+related:
+  - "[[index]]"
+  - "[[dashboard]]"
+  - "[[Wiki Map]]"
+  - "[[Hot Cache]]"
+  - "[[LLM Wiki Pattern]]"
+  - "[[Compounding Knowledge]]"
+  - "[[LLM Wiki Pattern]]"
+  - "[[Hot Cache]]"
+  - "[[Compounding Knowledge]]"
+---
+
+# Concepts Index
+
+Navigation: [[index]] | [[entities/_index|Entities]] | [[sources/_index|Sources]]
+
+All concept pages — ideas, patterns, and frameworks extracted from sources.
+
+---
+
+## Knowledge Management
+
+- [[LLM Wiki Pattern]] — the core architecture for persistent, compounding knowledge bases
+- [[Hot Cache]] — ~500-word session context file, updated after every ingest
+- [[Compounding Knowledge]] — why the wiki grows more valuable over time, unlike RAG
+- [[DragonScale Memory]] — memory-layer spec: fold operator, deterministic page addresses, semantic tiling, boundary-first autoresearch (status: shipped v0.4, all four mechanisms opt-in)
+- [[Persistent Wiki Artifact]]: durable Markdown page as the LLM's memory object (developing)
+- [[Source-First Synthesis]]: provenance discipline for LLM wiki layers (developing)
+- [[Query-Time Retrieval]]: query synthesis with citations, complementary to Obsidian search (developing)
+
+---
+
+## Add new concepts here as they are extracted from sources.
@@ -0,0 +1,156 @@
+---
+type: concept
+title: "Cherry-Picks: Feature Backlog from Ecosystem Research"
+created: 2026-04-08
+updated: 2026-04-08
+tags:
+  - backlog
+  - cherry-picks
+  - product-roadmap
+  - claude-obsidian
+status: current
+related:
+  - "[[claude-obsidian-ecosystem]]"
+  - "[[LLM Wiki Pattern]]"
+sources:
+  - "[[claude-obsidian-ecosystem-research]]"
+---
+
+# Cherry-Picks: Feature Backlog
+
+> Sourced from ecosystem research 2026-04-08 | 16+ projects analyzed
+> Prioritized by: impact × implementation ease × uniqueness
+
+---
+
+## Tier 1 — Quick Wins (High Impact, Low Effort)
+
+### 1. URL Ingestion in /wiki-ingest
+**Source**: ekadetov/llm-wiki, Ar9av/obsidian-wiki
+**What it is**: Pass a URL directly to ingest instead of a file path. Agent fetches the page, cleans it, saves to `.raw/`, then ingests.
+**Current state**: Users must manually copy-paste web content.
+**How to add**: Detect `https://` prefix in ingest skill → WebFetch → save to `.raw/articles/` → proceed with normal ingest.
+**Bonus**: Pair with **defuddle** (kepano's web cleaner) for clean token-efficient extraction.
+
+### 2. Auto-Commit PostToolUse Hook
+**Source**: ballred/obsidian-claude-pkm, ekadetov/llm-wiki
+**What it is**: Every Write/Edit tool call in the vault triggers `git add -A && git commit -m "auto: [filename] [timestamp]"`.
+**Current state**: No auto-commit. Users must manually push.
+**How to add**: PostToolUse hook in hooks.json targeting Write + Edit tools, scoped to wiki/ directory.
+**Note**: Makes vault a proper version-controlled knowledge base automatically.
+
+### 3. defuddle Web Cleaning Skill
+**Source**: kepano/obsidian-skills
+**What it is**: A skill that wraps `defuddle-cli` — strips ads, nav, clutter from web pages before ingest. Reduces token usage ~40-60% on typical web articles.
+**How to add**: New `defuddle` sub-skill or reference in wiki-ingest. Requires `defuddle-cli` npm package.
+
+---
+
+## Tier 2 — Medium Effort, High Value
+
+### 4. Delta Tracking Manifest
+**Source**: Ar9av/obsidian-wiki
+**What it is**: `.raw/.manifest.json` tracking every ingested source — path, hash, timestamp, which wiki pages it produced. Re-ingest only processes new/changed files.
+**Current state**: Every `/wiki-ingest` call re-processes everything.
+**How to add**:
+  - On ingest: compute MD5 hash of source → check manifest → skip if unchanged
+  - On ingest: record `{path, hash, ingested_at, pages_created}` in manifest
+  - On update: re-process if hash changed, merge changes into existing pages
+
+### 5. Multi-Depth Query Modes
+**Source**: rvk7895/llm-knowledge-bases
+**What it is**: 3 query tiers in `/wiki-query`:
+  - **Quick** — hot.md + index.md only (~3 pages read)
+  - **Standard** — full wiki cross-reference + optional web search supplement
+  - **Deep** — parallel sub-agents, each researching a different angle
+**Current state**: One depth level.
+**How to add**: `/wiki-query quick <question>`, `/wiki-query deep <question>` flags in SKILL.md.
+
+### 6. /wiki-ingest Vision Support
+**Source**: Ar9av/obsidian-wiki
+**What it is**: Ingest images, screenshots, whiteboard photos by passing the image to a vision-capable model.
+**How to add**: Detect image extension → read as base64 → pass to Claude with vision prompt asking for transcription/description → treat result as text source → standard ingest pipeline.
+**Useful for**: Whiteboard photos from meetings, screenshots of web content, diagrams.
+
+---
+
+## Tier 3 — Bigger Features Worth Planning
+
+### 7. /adopt — Import Existing Vault
+**Source**: heyitsnoah/claudesidian, ballred/obsidian-claude-pkm
+**What it is**: `/adopt` analyzes an existing Obsidian vault, detects its organization method (PARA, Zettelkasten, LYT, plain), and wraps the LLM Wiki pattern around it without destroying existing structure.
+**Why it matters**: Currently, users must start fresh. This unlocks adoption by people with existing vaults.
+**Implementation**: Scan folder structure → classify patterns → generate CLAUDE.md mapping existing folders to wiki roles → non-destructive.
+
+### 8. Productivity Wrapper (Daily/Weekly Reviews)
+**Source**: ballred/obsidian-claude-pkm
+**What it is**: Optional `/daily` and `/weekly` skills that connect goal tracking to the knowledge base.
+**Could be a separate plugin** rather than bundled into claude-obsidian.
+**Goal cascade**: 3-Year Vision → Yearly Goals → Projects → Weekly → Daily.
+
+### 9. Multi-Agent Compatibility (Cursor, Windsurf, Codex)
+**Source**: Ar9av/obsidian-wiki, kepano/obsidian-skills
+**What it is**: A `setup.sh` or `/wiki-convert` command that generates `.cursor/rules/`, `AGENTS.md`, `GEMINI.md` equivalents so the wiki skills work in other coding agents.
+**Note**: kepano already published skills in Agent Skills format — claude-obsidian is already in that format. Just needs the adapter files.
+
+### 10. Marp Presentation Output
+**Source**: rvk7895/llm-knowledge-bases, ekadetov/llm-wiki
+**What it is**: `/wiki-query --slides <topic>` generates a Marp presentation from wiki content, saved to `output/`.
+**Requires**: `marp-cli` npm package.
+
+---
+
+## Tier 4 — Research / Ecosystem Plays
+
+### 11. obsidian-memory-mcp Integration
+**Source**: YuNaga224/obsidian-memory-mcp
+**What it is**: Connect the MCP server that stores Claude's memories as Markdown entities with `[[wikilinks]]` → they appear in Obsidian graph view automatically.
+**How to add**: Point MEMORY_DIR to the wiki/entities/ directory — entity memory pages become proper wiki pages.
+
+### 12. obsidian-bases Skill (from kepano)
+**Source**: kepano/obsidian-skills
+**What it is**: Teach Claude how to create and edit Obsidian Bases (.base files) for dynamic tables, views, and filters.
+**Why**: Obsidian Bases is a new core feature — no other LLM Wiki project teaches Claude about it yet.
+
+### 13. Schema-Emergent Vault Mode
+**Source**: Ar9av/obsidian-wiki
+**What it is**: Alternative /wiki mode where the vault structure is not scaffolded upfront but emerges from ingested content. Good for exploratory knowledge building vs. structured domains.
+**How**: Skip the scaffold step; let wiki-ingest create folders/categories organically based on source content.
+
+---
+
+## Competitive Positioning
+
+After this research, claude-obsidian's unique advantages remain:
+- **Hot cache** — no one else has this session context mechanism
+- **Canvas visual layer** — unique in the LLM Wiki category
+- **/save conversation** — filing chat → wiki is a distinct workflow
+- **Marketplace polish** — best install experience in category
+- **Community distribution** (avalonreset-pro)
+
+The ecosystem is maturing fast. Tier 1 items (URL ingest, auto-commit, defuddle) should ship in v1.3.0 to stay ahead.
+
+---
+
+## Implementation Priority
+
+```
+v1.3.0 (quick wins):
+  - URL ingestion (#1)
+  - Auto-commit hook (#2)
+  - defuddle integration (#3)
+
+v1.4.0 (quality):
+  - Delta tracking (#4)
+  - Multi-depth query (#5)
+
+v1.5.0 (expansion):
+  - Vision ingest (#6)
+  - /adopt command (#7)
+  - Multi-agent compat (#9)
+
+Future:
+  - Productivity wrapper (#8)
+  - Marp output (#10)
+  - Memory MCP integration (#11)
+```