baram2584/MultiPhysicsVault

Fork 0

Files

T

김경종 72dad72703

Tests / Hermetic test suite (push) Has been cancelled

Details

Tests / Skill frontmatter validation (push) Has been cancelled

Details

add claude-obsidian

2026-05-28 10:57:16 +09:00

21 KiB

Raw Blame History

type, title, address, complexity, domain, aliases, created, updated, tags, status, related, sources

type

title

address

complexity

domain

aliases

created

updated

DragonScale Memory

A memory-layer design for LLM wiki vaults, inspired by the Heighway dragon curve. Four mechanisms (fold operator, deterministic page addresses, semantic tiling, boundary-first autoresearch) give an LLM-maintained wiki a principled way to grow, compact, and stay coherent. The dragon curve is a design-justification device, not a reasoning architecture.

Status: v0.4 2026-04-24. All four mechanisms shipped as opt-in features. Phase 0 (spec) + Phase 1 (wiki-fold skill, dry-run verified) + Phase 2 (address MVP) + Phase 3 (semantic tiling) + Phase 3.5/3.6 (hardening) + Phase 4 (boundary-first autoresearch). See Review History for the progression.

Scope

DragonScale is a memory architecture: it governs how a wiki grows, compacts, addresses its pages, and checks for duplicates. It is not a search, planning, or reasoning algorithm. Agent reasoning uses existing patterns (Tree of Thoughts with BFS/DFS/beam search; Yao et al. 2023).

Honest disclaimer: memory-layer choices are never neutral with respect to reasoning. What the vault surfaces, and in what order, shapes what the model sees. Long-context performance is position-sensitive (Liu et al. 2023, Lost in the Middle), and MemGPT's premise is that paging policy affects task success (Packer et al. 2023). One of the four mechanisms below (boundary-first autoresearch) explicitly crosses into agenda control; it is included deliberately and marked as such.

The Core Analogy

Four dragon-curve properties map onto memory-system patterns already validated in adjacent fields. The word is analogue, not identity.

Dragon curve property	Memory analogue	Strength of analogy
Paper-folding recursion: `D_{n+1} = D_n · R · swap(reverse(D_n))`	Hierarchical rollup / materialized summary with exponential fanout	Loose. Shares exponential batch structure, not compaction semantics.
Turn derivable from bits of `n` (regular paperfolding sequence, OEIS A014577)	Deterministic page addresses as organizational convention (MVP is a creation-order counter, not a true content hash)	Loose. Deterministic addressing is useful independent of the dragon.
Tiling / no self-intersection	Canonical-home coverage: one concept, one page	Medium. Dedup lint enforces this mechanically.
Boundary dim ≈ 1.523627 vs interior dim 2	Agent attention weighted toward frontier pages	Aesthetic. The fractal dimension number does no load-bearing work.

The curve is useful for deciding which knobs to tighten and why, not as a math proof that any given mechanism is optimal.

Mechanism 1 — Fold Operator

After a batch of ingests, run a fold: produce a meta-page summarizing the batch, link children back, update the index. Folds stack: after enough level-k folds accumulate, a level-k+1 fold produces a super-summary.

This is a hierarchical rollup, loosely similar to LSM-tree compaction but with important differences.

What it shares with LSM compaction:

Exponential batch fanout across levels (like LevelDB's fixed level-size ratio, typically 10× per level in leveled mode)
Periodic consolidation rather than per-write work

What it does NOT inherit from LSM:

No sorted-key semantics (pages have semantic, not key-ordered, identity)
No SSTable/memtable distinction, no tombstones, no Bloom filters
No write-amplification arithmetic; no read-path acceleration
Folds are additive: children remain in place. LSM compaction rewrites and deletes. A DragonScale fold is closer to a materialized view than a compaction.

Trigger options:

2^k entry count (k=4 ⇒ every 16 log entries). Simple to implement; straightforward level math; ignores page size and novelty.
Adaptive trigger (preferred for production): token budget (e.g., fold when unfolded batch exceeds N tokens), novelty score (average embedding distance from existing summaries), or staleness age (last fold > T days). Phase 1 will implement entry-count for MVP; adaptive triggers are a follow-up.

Invariants:

Idempotent on the same range (re-running is a no-op).
Reversible (children stay; a fold is additive).
Level-bounded: with entry-count trigger 2^k, fold depth is at most ⌈log₂(N)⌉ above leaf pages. Derived, not empirical.

Mechanism 2 — Deterministic Page Addresses

Every new page gets a stable address field in frontmatter. The Phase 2 MVP uses a simple creation-order counter:

address: c-000042

Format: c-<6-digit-counter>. c- means "creation-order counter." Zero-padded.

Future extension (documented, not shipped in Phase 2):

Fold-relative path: f1.2/c-000042 once folds exist, where f1.2 encodes the fold-tree lineage.
Content hash suffix: c-000042:h7f3c2 once the hash-rotation policy is decided.

What Phase 2 MVP gives:

Uniqueness: counter is monotonically increasing; deleted pages' addresses are retired, never reused.
Stability: never changes across content edits.
Determinism: derivable from the counter state at .vault-meta/address-counter.txt.
Ordering: preserves creation sequence.

What this does NOT give (renamed "content-addressable paths" was misleading in v0.1):

No content-addressability in the MVP. The Phase 2 address is a sequence counter, not a content hash. Renaming this mechanism from "content-addressable paths" to "deterministic page addresses" is more honest about what actually ships.
No prompt cache benefit (already corrected in v0.1 → v0.2). Per Anthropic docs, cache hits require byte-identical prefixes; an address field in frontmatter only helps if the frontmatter itself is inside a cached block AND stays byte-identical. Stable prefixes, not addresses, drive cache hits.

Phase 2 exclusions (all deferred):

Backfill of legacy pre-Phase-2 pages (will use l- prefix with its own counter).
Fold-ancestry bit prefix (requires committed folds from a future fold-of-folds skill).
Content hash suffix (rotation policy unresolved; see limitations).

Implementation (Phase 2, shipped):

scripts/allocate-address.sh: flock-guarded atomic allocator. All counter reads/writes go through this script; direct Write/Edit on .vault-meta/address-counter.txt is prohibited (would fire PostToolUse hook).
skills/wiki-ingest/SKILL.md → Address Assignment section: opt-in feature detection; delegates allocation to the helper; records path-to-address mapping in .raw/.manifest.json address_map for re-ingest stability.
skills/wiki-lint/SKILL.md → Address Validation section: format check, uniqueness check, counter-drift check, address-map consistency check.

Lint severity model (matches skills/wiki-lint/SKILL.md Address Validation behavior):

Post-rollout pages (frontmatter created: >= 2026-04-23, or any page newly created after DragonScale adoption) that lack an address are errors. This is the silent-regression guard.
Legacy pages (created: < 2026-04-23) without addresses are informational. The optional .vault-meta/legacy-pages.txt manifest can grandfather pages whose created: metadata is wrong or missing.
Meta pages (_index.md, index.md, log.md, hot.md, etc.) and fold pages are excluded entirely.

Mechanism 3 — Semantic Tiling Lint

The tiling property says the same concept should live in one canonical page. Enforce it with an embedding-based dedup check in wiki-lint.

Procedure (calibrated, not a guess):

Compute embeddings for every page. Default model: local nomic-embed-text via ollama on http://127.0.0.1:11434. Cost: local hardware time only (no API fees). The script supports a remote override under --allow-remote-ollama; remote endpoints may incur provider API fees.
Compute pairwise cosine similarities for all page pairs.
Calibration (one-time, before first use): label 50-100 in-vault page pairs as duplicate/near/distinct; find the thresholds that optimize target precision for each band.
Default bands (used before calibration, then refined):
- ≥ 0.90 — near-duplicate, lint error
- 0.80 – 0.90 — review bucket, lint warning
- < 0.80 — distinct, no flag
Never auto-merge. Output a review list.

Why not a fixed 0.85? v0.1 used 0.85 with no justification. Published thresholds in the embeddings literature span a wide range (Sentence Transformers' community_detection defaults to 0.75; Quora-duplicate calibrations land around 0.77–0.83; sparse-model defaults differ again). Thresholds are model-, corpus-, and objective-dependent, so calibration is required.

Mechanism 4 — Boundary-First Autoresearch

Status: shipped (Phase 4, opt-in) as of 2026-04-24. Implementation: scripts/boundary-score.py. Integration: skills/autoresearch/SKILL.md Topic Selection section B. Tests: tests/test_boundary_score.py.

Boundary pages (high out-degree relative to in-degree, recency-weighted) are the vault's frontier. /autoresearch invoked without a topic reads the top-5 boundary pages and offers them as research candidates; the user selects one (or types a free-text topic, or declines all and falls back to the original ask-user mode).

Formula (exact):

out_degree(p) = count of distinct filename-stem wikilinks in body of p that resolve to scoreable pages
in_degree(p)  = count of distinct scoreable pages whose body contains a wikilink to p
recency_weight(p) = exp(-days_since_updated / 30)      # no floor; old pages approach 0
boundary_score(p) = (out_degree - in_degree) * recency_weight

Link resolution: filename-stem only. [[Foo]] resolves to Foo.md anywhere in the vault. Aliases declared via frontmatter aliases: are NOT parsed. Folder-qualified links (e.g. [[notes/Foo]]) are resolved by stem alone. This matches Obsidian's default behavior for unique filenames but does not implement full alias resolution.

Scoreable = any page NOT excluded by any of:

frontmatter type: meta or type: fold
filename in {_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, Wiki Map.md, getting-started.md}
path prefix in wiki/folds/ or wiki/meta/
symlinks or paths whose resolved target escapes the vault root (rejected at scan time)

Code-block filtering: triple-backtick AND triple-tilde fenced code blocks are skipped, with CommonMark-like length tracking so a longer opening fence is not closed by a shorter inner fence. Indented code blocks (4+ spaces) are NOT filtered because Obsidian bullet lists commonly use 4-space indentation and contain real wikilinks. See scripts/boundary-score.py:RECENCY_HALFLIFE_DAYS for the sole tunable constant.

Honest labeling: this mechanism is agenda control, not pure memory. It shapes what the agent researches next. It is included in DragonScale because it is a direct consequence of the dragon-curve boundary analogy, and because it pairs naturally with folds (freshly folded pages have low out-degree; frontier pages are pre-fold). But the "memory only, not reasoning" framing does not cover it. Users who want a strict memory-layer subset should omit this mechanism (simply do not invoke /autoresearch without a topic, or do not set up scripts/boundary-score.py).

What is NOT included:

No auto-triggering. /autoresearch is still user-invoked.
No persistent boundary-score cache. Scoring is O(N * avg_links) and runs on every invocation from fresh wiki/ state.
No integration with folds or addresses. Pure graph analysis on the wikilink graph.
No automatic topic selection without user confirmation. The helper presents choices; the user picks.

Operational Policies (required before implementation)

Adversarial review flagged these gaps in v0.1. Each must be decided before the corresponding phase ships.

Policy	Phase 0 position	Decision point
Retention / GC	No automatic deletion. Pages are permanent.	Revisit if vault exceeds ~5000 pages.
Tombstones	None. Deleted pages are removed via git revert.	Revisit if delete events become common.
Versioning	Relied on git history, not in-vault versioning.	Address-hash rotation policy doubles as a coarse version signal.
Conflict resolution for contradictory folds	Meta-page must quote both sources with explicit "conflict" callout. No automatic resolution.	Phase 1 spec required.
Concurrency / atomicity	Single-writer assumption (one Claude session at a time). PostToolUse auto-commit serializes.	Multi-writer case deferred.
Provenance for meta-pages	Every fold page must include frontmatter listing children and fold level.	Phase 1 must enforce.
Access control	Out of scope. This is a single-user vault.	Revisit only if shared.

Mapping to Claude-Obsidian

Mechanism	Status	New	Extends
Fold operator	shipped (Phase 1, dry-run verified)	`skills/wiki-fold/`	reads `log.md`, writes `wiki/folds/`, updates `index.md` on commit
Address anchors	shipped (Phase 2, opt-in)	`scripts/allocate-address.sh`, new frontmatter field	`wiki-ingest` (assignment), `wiki-lint` (validation)
Semantic tiling	shipped (Phase 2/3, opt-in)	`scripts/tiling-check.py`, `.vault-meta/tiling-thresholds.json`	`wiki-lint` with banded thresholds, calibration procedure documented
Boundary-first	shipped (Phase 4, opt-in)	`scripts/boundary-score.py`, `tests/test_boundary_score.py`	`skills/autoresearch/SKILL.md` Topic Selection section B; `commands/autoresearch.md` no-topic path

The existing hot → index → domain → page hierarchy already implements self-similarity across scales. That's the one dragon-curve property this vault had before DragonScale.

Why This Over Alternatives

Pattern	What it gives	What DragonScale adds
MemGPT virtual context (two-tier paging)	Main context ↔ external context swap	More than two levels; explicit fold triggers; dedup lint
Pure LSM compaction	Exponential write-path throughput	Semantic-layer mechanisms (tiling, boundary); additive rollups over destructive merges
Ad-hoc `/save`	Human-triggered filing	Rule-based fold cadence
Vector-only RAG	Retrieval	Canonical-home structure; lineage addresses

DragonScale composes patterns validated in adjacent systems: LSM batching (databases), MemGPT paging (agents), Anthropic cache ordering (prompt engineering), and embedding dedup (knowledge graphs).

Known Limitations (v0.3)

Unvalidated at scale. All four mechanisms are theoretical; none tested on a multi-thousand-page vault.
Fold cadence is a knob, not a theorem. k=4 is a starting guess. Adaptive triggers are likely better.
Address stability is unsolved. Hash rotation on edits is a known issue; deferred.
Boundary-first crosses scope. Included with a warning, not quietly.
Calibration load. Tiling requires a one-time labeling pass; without it, only defaults apply.

Primary Sources

Verified against primary sources on 2026-04-23. Scope of tagging: the specific numeric values, formulas, and named patterns below are tagged [sourced] when directly citable, [derived] when derivable from sourced material, or [conjecture] when based on reasoning without a specific source. Not tagged (and readers should treat as interpretive synthesis): framing sentences in the body such as "composes patterns validated," "self-similarity already exists," and the design rationale tying the four mechanisms together. These are editorial, not source-backed.

Dragon curve math [sourced]

Boundary dimension 2·log₂(λ) where λ³ − λ² − 2 = 0, giving 1.523627086: Dragon curve, Wikipedia
Paper-folding construction and OEIS A014577: Regular paperfolding sequence, Wikipedia; OEIS A014577
Tiling and rep-tiles: Wolfram Demonstrations: Tiling Dragons and Rep-tiles of Order Two

LSM trees [sourced]

Level size ratios and compaction semantics: RocksDB Compaction wiki, RocksDB Tuning Guide, How to Grow an LSM-tree? (2025)
LevelDB 10× level ratio: referenced in the arXiv paper above. Treat as typical, not required.

LLM memory architectures [sourced]

OS-inspired paging: MemGPT: Towards LLMs as Operating Systems (Packer et al. 2023)
Position sensitivity: Lost in the Middle (Liu et al. 2023)
Note-based agentic memory: A-Mem (2025)

Prompt caching [sourced]

Byte-identical prefix requirement, breakpoint mechanics, TTL options: Anthropic Prompt Caching docs

Embedding thresholds [sourced]

Sentence Transformers defaults and calibration examples: Sentence Transformers util, SBERT evaluation docs

Reasoning search (out of scope, cited only to justify the scope boundary) [sourced]

Tree of Thoughts (Yao et al. 2023)

Items marked [conjecture] in this doc:

k=4/k=5 starting value for fold cadence (needs empirical tuning)
~30s full-vault embedding-pass time (needs measurement)
boundary_score formula exact weighting (a plausible starting form; not validated against retrieval metrics)

Items marked [derived]:

⌈log₂(N)⌉ fold-depth bound (trivially derivable from the entry-count trigger)
Default tiling bands {≥0.90, 0.80-0.90, <0.80} before calibration (interpolated from cited ranges in Sentence Transformers examples; not optimal by construction)

Review History

v0.1 (2026-04-23, initial draft) — written after a verification pass against Wikipedia, arXiv, and Anthropic docs. Four mechanisms proposed.

v0.4 (2026-04-24, Phase 4 shipped) — Mechanism 4 (boundary-first autoresearch) implemented as scripts/boundary-score.py with tests/test_boundary_score.py covering parsing, recency weight, wikilink extraction (with fence-length + tilde + indented-block tests), graph construction (self-loop/unresolved/meta-target exclusion), symlink rejection, and CLI surface (--top, --page, --json). Integrated into skills/autoresearch/SKILL.md as an opt-in Topic Selection mode with explicit helper-failure fallback. Spec's "NOT IMPLEMENTED" marker removed; exact scoring formula (no recency floor), filename-stem-only resolution disclosure, scope, and "what is NOT included" section added. Phase 3.6 pre-Phase-4 hardening shipped concurrently (5 fixes: --report path confinement, rollout baseline, AGENTS.md consistency, wiki-ingest .raw contradiction, install-guide version).

v0.3 (2026-04-23, Phase 2 alignment) — Mechanism 2 rewritten to match the actual Phase 2 MVP shipped in wiki-ingest and wiki-lint. Renamed from "Content-Addressable Paths" to "Deterministic Page Addresses" (the MVP is a creation-order counter, not a content hash). Documented the extension path for fold-ancestry bits and content-hash suffix, both explicitly deferred.

v0.2 (2026-04-23, post-adversarial review) — after codex exec adversarial review. All 7 critiques accepted:

LSM "structurally identical" → weakened to "loosely analogous to hierarchical rollup"; non-inherited properties listed explicitly.
Prompt cache address benefit → removed strong claim; narrowed to organizational convention.
0.85 threshold → replaced with calibration procedure and banded defaults.
2^k cadence → justified as implementation convenience; adaptive trigger flagged as preferred for production.
Scope boundary contradiction → acknowledged; boundary-first explicitly labeled as agenda control.
Missing production mechanisms → added Operational Policies section (retention, versioning, conflict resolution, concurrency, provenance).
Unverified claims → tagged specific numeric values, formulas, and named patterns as [sourced], [derived], or [conjecture]. Editorial synthesis in the body explicitly flagged as not tagged (see scope note under Primary Sources).

Connections

See LLM Wiki Pattern for the broader pattern this extends. See Compounding Knowledge for why persistent state is the precondition for DragonScale. See Hot Cache for the existing 500-word session context, which is a level-0 manual fold. See Andrej Karpathy for the intellectual lineage.

21 KiB Raw Blame History Unescape Escape