Files
MultiPhysicsVault/wiki/concepts/DragonScale Memory.md
T
김경종 72dad72703
Tests / Hermetic test suite (push) Has been cancelled
Tests / Skill frontmatter validation (push) Has been cancelled
add claude-obsidian
2026-05-28 10:57:16 +09:00

21 KiB
Raw Blame History

type, title, address, complexity, domain, aliases, created, updated, tags, status, related, sources
type title address complexity domain aliases created updated tags status related sources
concept DragonScale Memory c-000001 advanced knowledge-management
DragonScale
DragonScale Architecture
Fractal Memory
2026-04-23 2026-04-24
concept
knowledge-management
memory
architecture
fractal
proposed
LLM Wiki Pattern
Compounding Knowledge
Hot Cache
concepts/_index

DragonScale Memory

A memory-layer design for LLM wiki vaults, inspired by the Heighway dragon curve. Four mechanisms (fold operator, deterministic page addresses, semantic tiling, boundary-first autoresearch) give an LLM-maintained wiki a principled way to grow, compact, and stay coherent. The dragon curve is a design-justification device, not a reasoning architecture.

Status: v0.4 2026-04-24. All four mechanisms shipped as opt-in features. Phase 0 (spec) + Phase 1 (wiki-fold skill, dry-run verified) + Phase 2 (address MVP) + Phase 3 (semantic tiling) + Phase 3.5/3.6 (hardening) + Phase 4 (boundary-first autoresearch). See Review History for the progression.


Scope

DragonScale is a memory architecture: it governs how a wiki grows, compacts, addresses its pages, and checks for duplicates. It is not a search, planning, or reasoning algorithm. Agent reasoning uses existing patterns (Tree of Thoughts with BFS/DFS/beam search; Yao et al. 2023).

Honest disclaimer: memory-layer choices are never neutral with respect to reasoning. What the vault surfaces, and in what order, shapes what the model sees. Long-context performance is position-sensitive (Liu et al. 2023, Lost in the Middle), and MemGPT's premise is that paging policy affects task success (Packer et al. 2023). One of the four mechanisms below (boundary-first autoresearch) explicitly crosses into agenda control; it is included deliberately and marked as such.


The Core Analogy

Four dragon-curve properties map onto memory-system patterns already validated in adjacent fields. The word is analogue, not identity.

Dragon curve property Memory analogue Strength of analogy
Paper-folding recursion: D_{n+1} = D_n · R · swap(reverse(D_n)) Hierarchical rollup / materialized summary with exponential fanout Loose. Shares exponential batch structure, not compaction semantics.
Turn derivable from bits of n (regular paperfolding sequence, OEIS A014577) Deterministic page addresses as organizational convention (MVP is a creation-order counter, not a true content hash) Loose. Deterministic addressing is useful independent of the dragon.
Tiling / no self-intersection Canonical-home coverage: one concept, one page Medium. Dedup lint enforces this mechanically.
Boundary dim ≈ 1.523627 vs interior dim 2 Agent attention weighted toward frontier pages Aesthetic. The fractal dimension number does no load-bearing work.

The curve is useful for deciding which knobs to tighten and why, not as a math proof that any given mechanism is optimal.


Mechanism 1 — Fold Operator

After a batch of ingests, run a fold: produce a meta-page summarizing the batch, link children back, update the index. Folds stack: after enough level-k folds accumulate, a level-k+1 fold produces a super-summary.

This is a hierarchical rollup, loosely similar to LSM-tree compaction but with important differences.

What it shares with LSM compaction:

  • Exponential batch fanout across levels (like LevelDB's fixed level-size ratio, typically 10× per level in leveled mode)
  • Periodic consolidation rather than per-write work

What it does NOT inherit from LSM:

  • No sorted-key semantics (pages have semantic, not key-ordered, identity)
  • No SSTable/memtable distinction, no tombstones, no Bloom filters
  • No write-amplification arithmetic; no read-path acceleration
  • Folds are additive: children remain in place. LSM compaction rewrites and deletes. A DragonScale fold is closer to a materialized view than a compaction.

Trigger options:

  • 2^k entry count (k=4 ⇒ every 16 log entries). Simple to implement; straightforward level math; ignores page size and novelty.
  • Adaptive trigger (preferred for production): token budget (e.g., fold when unfolded batch exceeds N tokens), novelty score (average embedding distance from existing summaries), or staleness age (last fold > T days). Phase 1 will implement entry-count for MVP; adaptive triggers are a follow-up.

Invariants:

  • Idempotent on the same range (re-running is a no-op).
  • Reversible (children stay; a fold is additive).
  • Level-bounded: with entry-count trigger 2^k, fold depth is at most ⌈log₂(N)⌉ above leaf pages. Derived, not empirical.

Mechanism 2 — Deterministic Page Addresses

Every new page gets a stable address field in frontmatter. The Phase 2 MVP uses a simple creation-order counter:

address: c-000042

Format: c-<6-digit-counter>. c- means "creation-order counter." Zero-padded.

Future extension (documented, not shipped in Phase 2):

  • Fold-relative path: f1.2/c-000042 once folds exist, where f1.2 encodes the fold-tree lineage.
  • Content hash suffix: c-000042:h7f3c2 once the hash-rotation policy is decided.

What Phase 2 MVP gives:

  • Uniqueness: counter is monotonically increasing; deleted pages' addresses are retired, never reused.
  • Stability: never changes across content edits.
  • Determinism: derivable from the counter state at .vault-meta/address-counter.txt.
  • Ordering: preserves creation sequence.

What this does NOT give (renamed "content-addressable paths" was misleading in v0.1):

  • No content-addressability in the MVP. The Phase 2 address is a sequence counter, not a content hash. Renaming this mechanism from "content-addressable paths" to "deterministic page addresses" is more honest about what actually ships.
  • No prompt cache benefit (already corrected in v0.1 → v0.2). Per Anthropic docs, cache hits require byte-identical prefixes; an address field in frontmatter only helps if the frontmatter itself is inside a cached block AND stays byte-identical. Stable prefixes, not addresses, drive cache hits.

Phase 2 exclusions (all deferred):

  • Backfill of legacy pre-Phase-2 pages (will use l- prefix with its own counter).
  • Fold-ancestry bit prefix (requires committed folds from a future fold-of-folds skill).
  • Content hash suffix (rotation policy unresolved; see limitations).

Implementation (Phase 2, shipped):

  • scripts/allocate-address.sh: flock-guarded atomic allocator. All counter reads/writes go through this script; direct Write/Edit on .vault-meta/address-counter.txt is prohibited (would fire PostToolUse hook).
  • skills/wiki-ingest/SKILL.md → Address Assignment section: opt-in feature detection; delegates allocation to the helper; records path-to-address mapping in .raw/.manifest.json address_map for re-ingest stability.
  • skills/wiki-lint/SKILL.md → Address Validation section: format check, uniqueness check, counter-drift check, address-map consistency check.

Lint severity model (matches skills/wiki-lint/SKILL.md Address Validation behavior):

  • Post-rollout pages (frontmatter created: >= 2026-04-23, or any page newly created after DragonScale adoption) that lack an address are errors. This is the silent-regression guard.
  • Legacy pages (created: < 2026-04-23) without addresses are informational. The optional .vault-meta/legacy-pages.txt manifest can grandfather pages whose created: metadata is wrong or missing.
  • Meta pages (_index.md, index.md, log.md, hot.md, etc.) and fold pages are excluded entirely.

Mechanism 3 — Semantic Tiling Lint

The tiling property says the same concept should live in one canonical page. Enforce it with an embedding-based dedup check in wiki-lint.

Procedure (calibrated, not a guess):

  1. Compute embeddings for every page. Default model: local nomic-embed-text via ollama on http://127.0.0.1:11434. Cost: local hardware time only (no API fees). The script supports a remote override under --allow-remote-ollama; remote endpoints may incur provider API fees.
  2. Compute pairwise cosine similarities for all page pairs.
  3. Calibration (one-time, before first use): label 50-100 in-vault page pairs as duplicate/near/distinct; find the thresholds that optimize target precision for each band.
  4. Default bands (used before calibration, then refined):
    • ≥ 0.90 — near-duplicate, lint error
    • 0.80 0.90 — review bucket, lint warning
    • < 0.80 — distinct, no flag
  5. Never auto-merge. Output a review list.

Why not a fixed 0.85? v0.1 used 0.85 with no justification. Published thresholds in the embeddings literature span a wide range (Sentence Transformers' community_detection defaults to 0.75; Quora-duplicate calibrations land around 0.770.83; sparse-model defaults differ again). Thresholds are model-, corpus-, and objective-dependent, so calibration is required.


Mechanism 4 — Boundary-First Autoresearch

Status: shipped (Phase 4, opt-in) as of 2026-04-24. Implementation: scripts/boundary-score.py. Integration: skills/autoresearch/SKILL.md Topic Selection section B. Tests: tests/test_boundary_score.py.

Boundary pages (high out-degree relative to in-degree, recency-weighted) are the vault's frontier. /autoresearch invoked without a topic reads the top-5 boundary pages and offers them as research candidates; the user selects one (or types a free-text topic, or declines all and falls back to the original ask-user mode).

Formula (exact):

out_degree(p) = count of distinct filename-stem wikilinks in body of p that resolve to scoreable pages
in_degree(p)  = count of distinct scoreable pages whose body contains a wikilink to p
recency_weight(p) = exp(-days_since_updated / 30)      # no floor; old pages approach 0
boundary_score(p) = (out_degree - in_degree) * recency_weight

Link resolution: filename-stem only. [[Foo]] resolves to Foo.md anywhere in the vault. Aliases declared via frontmatter aliases: are NOT parsed. Folder-qualified links (e.g. [[notes/Foo]]) are resolved by stem alone. This matches Obsidian's default behavior for unique filenames but does not implement full alias resolution.

Scoreable = any page NOT excluded by any of:

  • frontmatter type: meta or type: fold
  • filename in {_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, Wiki Map.md, getting-started.md}
  • path prefix in wiki/folds/ or wiki/meta/
  • symlinks or paths whose resolved target escapes the vault root (rejected at scan time)

Code-block filtering: triple-backtick AND triple-tilde fenced code blocks are skipped, with CommonMark-like length tracking so a longer opening fence is not closed by a shorter inner fence. Indented code blocks (4+ spaces) are NOT filtered because Obsidian bullet lists commonly use 4-space indentation and contain real wikilinks. See scripts/boundary-score.py:RECENCY_HALFLIFE_DAYS for the sole tunable constant.

Honest labeling: this mechanism is agenda control, not pure memory. It shapes what the agent researches next. It is included in DragonScale because it is a direct consequence of the dragon-curve boundary analogy, and because it pairs naturally with folds (freshly folded pages have low out-degree; frontier pages are pre-fold). But the "memory only, not reasoning" framing does not cover it. Users who want a strict memory-layer subset should omit this mechanism (simply do not invoke /autoresearch without a topic, or do not set up scripts/boundary-score.py).

What is NOT included:

  • No auto-triggering. /autoresearch is still user-invoked.
  • No persistent boundary-score cache. Scoring is O(N * avg_links) and runs on every invocation from fresh wiki/ state.
  • No integration with folds or addresses. Pure graph analysis on the wikilink graph.
  • No automatic topic selection without user confirmation. The helper presents choices; the user picks.

Operational Policies (required before implementation)

Adversarial review flagged these gaps in v0.1. Each must be decided before the corresponding phase ships.

Policy Phase 0 position Decision point
Retention / GC No automatic deletion. Pages are permanent. Revisit if vault exceeds ~5000 pages.
Tombstones None. Deleted pages are removed via git revert. Revisit if delete events become common.
Versioning Relied on git history, not in-vault versioning. Address-hash rotation policy doubles as a coarse version signal.
Conflict resolution for contradictory folds Meta-page must quote both sources with explicit "conflict" callout. No automatic resolution. Phase 1 spec required.
Concurrency / atomicity Single-writer assumption (one Claude session at a time). PostToolUse auto-commit serializes. Multi-writer case deferred.
Provenance for meta-pages Every fold page must include frontmatter listing children and fold level. Phase 1 must enforce.
Access control Out of scope. This is a single-user vault. Revisit only if shared.

Mapping to Claude-Obsidian

Mechanism Status New Extends
Fold operator shipped (Phase 1, dry-run verified) skills/wiki-fold/ reads log.md, writes wiki/folds/, updates index.md on commit
Address anchors shipped (Phase 2, opt-in) scripts/allocate-address.sh, new frontmatter field wiki-ingest (assignment), wiki-lint (validation)
Semantic tiling shipped (Phase 2/3, opt-in) scripts/tiling-check.py, .vault-meta/tiling-thresholds.json wiki-lint with banded thresholds, calibration procedure documented
Boundary-first shipped (Phase 4, opt-in) scripts/boundary-score.py, tests/test_boundary_score.py skills/autoresearch/SKILL.md Topic Selection section B; commands/autoresearch.md no-topic path

The existing hot → index → domain → page hierarchy already implements self-similarity across scales. That's the one dragon-curve property this vault had before DragonScale.


Why This Over Alternatives

Pattern What it gives What DragonScale adds
MemGPT virtual context (two-tier paging) Main context ↔ external context swap More than two levels; explicit fold triggers; dedup lint
Pure LSM compaction Exponential write-path throughput Semantic-layer mechanisms (tiling, boundary); additive rollups over destructive merges
Ad-hoc /save Human-triggered filing Rule-based fold cadence
Vector-only RAG Retrieval Canonical-home structure; lineage addresses

DragonScale composes patterns validated in adjacent systems: LSM batching (databases), MemGPT paging (agents), Anthropic cache ordering (prompt engineering), and embedding dedup (knowledge graphs).


Known Limitations (v0.3)

  • Unvalidated at scale. All four mechanisms are theoretical; none tested on a multi-thousand-page vault.
  • Fold cadence is a knob, not a theorem. k=4 is a starting guess. Adaptive triggers are likely better.
  • Address stability is unsolved. Hash rotation on edits is a known issue; deferred.
  • Boundary-first crosses scope. Included with a warning, not quietly.
  • Calibration load. Tiling requires a one-time labeling pass; without it, only defaults apply.

Primary Sources

Verified against primary sources on 2026-04-23. Scope of tagging: the specific numeric values, formulas, and named patterns below are tagged [sourced] when directly citable, [derived] when derivable from sourced material, or [conjecture] when based on reasoning without a specific source. Not tagged (and readers should treat as interpretive synthesis): framing sentences in the body such as "composes patterns validated," "self-similarity already exists," and the design rationale tying the four mechanisms together. These are editorial, not source-backed.

Dragon curve math [sourced]

LSM trees [sourced]

LLM memory architectures [sourced]

Prompt caching [sourced]

Embedding thresholds [sourced]

Reasoning search (out of scope, cited only to justify the scope boundary) [sourced]

Items marked [conjecture] in this doc:

  • k=4/k=5 starting value for fold cadence (needs empirical tuning)
  • ~30s full-vault embedding-pass time (needs measurement)
  • boundary_score formula exact weighting (a plausible starting form; not validated against retrieval metrics)

Items marked [derived]:

  • ⌈log₂(N)⌉ fold-depth bound (trivially derivable from the entry-count trigger)
  • Default tiling bands {≥0.90, 0.80-0.90, <0.80} before calibration (interpolated from cited ranges in Sentence Transformers examples; not optimal by construction)

Review History

v0.1 (2026-04-23, initial draft) — written after a verification pass against Wikipedia, arXiv, and Anthropic docs. Four mechanisms proposed.

v0.4 (2026-04-24, Phase 4 shipped) — Mechanism 4 (boundary-first autoresearch) implemented as scripts/boundary-score.py with tests/test_boundary_score.py covering parsing, recency weight, wikilink extraction (with fence-length + tilde + indented-block tests), graph construction (self-loop/unresolved/meta-target exclusion), symlink rejection, and CLI surface (--top, --page, --json). Integrated into skills/autoresearch/SKILL.md as an opt-in Topic Selection mode with explicit helper-failure fallback. Spec's "NOT IMPLEMENTED" marker removed; exact scoring formula (no recency floor), filename-stem-only resolution disclosure, scope, and "what is NOT included" section added. Phase 3.6 pre-Phase-4 hardening shipped concurrently (5 fixes: --report path confinement, rollout baseline, AGENTS.md consistency, wiki-ingest .raw contradiction, install-guide version).

v0.3 (2026-04-23, Phase 2 alignment) — Mechanism 2 rewritten to match the actual Phase 2 MVP shipped in wiki-ingest and wiki-lint. Renamed from "Content-Addressable Paths" to "Deterministic Page Addresses" (the MVP is a creation-order counter, not a content hash). Documented the extension path for fold-ancestry bits and content-hash suffix, both explicitly deferred.

v0.2 (2026-04-23, post-adversarial review) — after codex exec adversarial review. All 7 critiques accepted:

  1. LSM "structurally identical" → weakened to "loosely analogous to hierarchical rollup"; non-inherited properties listed explicitly.
  2. Prompt cache address benefit → removed strong claim; narrowed to organizational convention.
  3. 0.85 threshold → replaced with calibration procedure and banded defaults.
  4. 2^k cadence → justified as implementation convenience; adaptive trigger flagged as preferred for production.
  5. Scope boundary contradiction → acknowledged; boundary-first explicitly labeled as agenda control.
  6. Missing production mechanisms → added Operational Policies section (retention, versioning, conflict resolution, concurrency, provenance).
  7. Unverified claims → tagged specific numeric values, formulas, and named patterns as [sourced], [derived], or [conjecture]. Editorial synthesis in the body explicitly flagged as not tagged (see scope note under Primary Sources).

Connections

See LLM Wiki Pattern for the broader pattern this extends. See Compounding Knowledge for why persistent state is the precondition for DragonScale. See Hot Cache for the existing 500-word session context, which is a level-0 manual fold. See Andrej Karpathy for the intellectual lineage.