Files
김경종 72dad72703
Tests / Hermetic test suite (push) Has been cancelled
Tests / Skill frontmatter validation (push) Has been cancelled
add claude-obsidian
2026-05-28 10:57:16 +09:00

21 KiB
Raw Permalink Blame History

v1.7.2 + v1.8.0 "Best Ever Per Priority Research" Plan

Date: 2026-05-17 Branch: continue on v1.7.0-compound-vault (still local-only) Goal: close every honest deduction (v1.7.2 polish) AND add methodology modes (v1.8.0 — compass artifact priority gap 5) to land at 5/7 axes #1 per the original research Estimated effort: 10-12 hours focused work (4-5h v1.7.2 + 6-7h v1.8.0) Termination conditions:

  • v1.7.2 ship gate after Phase 6 (verifier + chair clean OR 2-round cap fired)
  • v1.8.0 ship gate after Phase 8 (verifier + chair clean OR 2-round cap fired)
  • 14h hard time cap; if v1.7.2 takes >6h, defer v1.8.0 to a separate session

0. Why this plan exists

Three rounds of verifier + chair scrutiny converged on 97/100:

  • Round 1 (initial v1.7.1 fixes): chair scored 96, verifier later found 5 polish items
  • Round 2 (polish commit): verifier said SHIP 0/0/0/0; chair found 2 items, then 3 more on harder probe
  • Round 3 (chair-probe fixes): verifier said SHIP 1 LOW; chair fixed inline

After Round 3 the remaining deductions are structural, not surface-level:

Honest deduction What it really is
Defect introduction: 100 (now clean after Round 3)
Internal consistency: 100 (now clean after Round 3)
/best-practices kernel: 88 Two structural issues that polish cannot lift
Net session score 97/100

The 88 on kernel has two specific causes:

  1. +5819 / -30 LOC across 41 files since main. The kernel says "delete more than you add"; this is the opposite.
  2. Three rounds were needed to converge. A kernel-disciplined slice would land in one pass.

Plus the broader repo still has:

  • 14 MEDIUM findings open from the v1.7.0 audit
  • 10 LOW findings open from the v1.7.0 audit

A genuine 100/100 requires all of these closed or explicitly deferred with rationale. This plan does that.


1. Acceptance criteria (defined BEFORE execution, per /best-practices "failure is the spec")

For this plan to count as "achieved 100/100":

  1. Final verifier dispatch returns 0 BLOCKER / 0 HIGH / 0 MEDIUM / 0 LOW on the entire main..HEAD diff
  2. Final chair adversarial probe (≥10 specific tests, listed in §7) returns 0 functional findings
  3. Net LOC delta main..HEAD shows non-trivial deletion: net additions ≤ +5000 OR deletions ≥ 200 LOC (whichever fires; both are honest measures of having pruned something)
  4. Every M1M14 + L1L10 is either CLOSED (with commit SHA in audit doc) or DEFERRED (with one-line milestone + rationale in audit doc). No silent omissions.
  5. make test stays green throughout
  6. Branch remains local until explicit user push authorization
  7. agents/verifier.md updated with the git-hygiene cut + any other self-improvement that emerged from the three-round retrospective

If any of these 7 cannot be met, the plan SHIPS at the achieved score with the gap explicitly documented. No silent shortfalls.


2. Phase 0 — Audit refresh (15 min)

Goal: know exactly what's open before touching code.

Steps:

  1. Re-read docs/audits/v1.7.0-audit-2026-05-17.md §8.1–§8.4 (BLOCKER/HIGH/MEDIUM/LOW ledgers) in full
  2. For each finding M1M14 + L1L10, categorize:
    • SHIPPED-in-v1.7.1: already closed by an existing commit (mark in audit)
    • CLOSEABLE-this-session: small, focused, no scope creep (target for Phase 3)
    • DEFER-with-rationale: legitimately bigger or roadmap-tied (target for §6 audit update)
  3. Write the categorization to a working scratch file at docs/audits/v1.7.2-coverage-matrix.md (deleted at end of plan; intermediate artifact)

Output: categorization for all 24 open findings. No code changes.


3. Phase 1 — Verifier self-improvement (10 min)

Goal: close the loop the chair-probe revealed (verifier missed hook.log not in gitignore).

Steps:

  1. Add to agents/verifier.md "Specifically check for in EVERY workstream" section, after item 4:
    5. **Git hygiene** — any new file path written by code in this diff (open files,
       log writes, cache writes, temp files) that is NOT already in `.gitignore` →
       HIGH. The PostToolUse auto-commit hook stages everything under wiki/, .raw/,
       .vault-meta/; an unignored runtime artifact creates a self-pollution loop on
       the next hook fire.
    6. **Additive-without-pruning** — if `git diff --shortstat main..HEAD` shows
       net additions > +500 LOC and deletions < 50 LOC, flag as MEDIUM. Real
       feature work adds lines; pure additive cycles with no pruning suggest v_prev
       cruft is being retained reflexively.
    
  2. Verify YAML frontmatter still parses (python3 -c "import yaml; yaml.safe_load(open('agents/verifier.md').read().split('---')[1])")
  3. Commit: docs(v1.7.2): verifier-agent self-improvement from 3-round retrospective

Output: verifier.md has two new "always check" items; next dispatch catches what this session's verifier missed.


4. Phase 2 — Close the +5819 / -30 LOC ratio (60-90 min)

Goal: prune v1.6 code that v1.7 superseded but didn't remove.

Steps:

  1. Inventory candidates:
    # Comments referencing pre-v1.7 behavior in skills/
    grep -rn "v1\.6\|legacy\|deprecated\|TODO\|FIXME" skills/ scripts/ bin/
    # Skill sections with "## v1.6 behavior" / "## Before v1.7" headers
    grep -rn "^## .*1\.6\|^### .*1\.6" skills/
    # Tool references in skills that v1.7 transport supersedes
    grep -rn "allowed-tools: .*Edit\|allowed-tools: .*Write" skills/
    
  2. For each candidate, decide:
    • PRUNE: code or doc that is dead post-v1.7 (e.g., a "v1.6 fallback" path that the v1.7 transport layer makes unreachable, a legacy comment block superseded by compound-vault-guide.md)
    • KEEP: legitimately current code or doc; add a one-line justification in the working scratch file
  3. Apply prunes in clusters (one commit per logical theme, e.g. "prune v1.6 transport assumptions", "prune superseded inline docs")
  4. After each prune commit, make test must stay green
  5. Acceptance gate: end-of-Phase-2 git diff --shortstat main..HEAD shows either net additions ≤ +5000 LOC or deletions ≥ 200 LOC

Failure mode: if v1.7 genuinely added only new features with zero v1.6 supersession, the +5819 stays as additive and the kernel deduction is irreducible. In that case, DOCUMENT it explicitly in audit §10.4 ("+5819 / -30 is the honest cost of building a substrate; v1.6 had no deprecation surface") and accept the score adjustment. Do not invent prunes to game the metric.

Output: N prune commits + scratch file justifying every retained piece of v1.6 code.


5. Phase 3 — Close the 14 MEDIUM findings (90-120 min)

Walk each finding from the v1.7.0 audit §8.3. Group related fixes; one commit per cluster.

# Finding Plan Effort Commit grouping
M1 §3.2 +485/-0 LOC Addressed in Phase 2 (in Phase 2)
M2 bm25-index.py non-ASCII tokenization drops content Extend regex [A-Za-z][A-Za-z0-9'\-]* to [\w'\-]+ with re.UNICODE; add hermetic test with emoji + CJK + Cyrillic + Spanish accented input; verify BM25 ranking changes are sensible 20 min C1
M3 rerank.py --allow-remote-ollama error blames user Improve error: "OLLAMA_URL points off-localhost; either run ollama locally or pass --allow-remote-ollama through retrieve.py (which forwards it here)" 5 min C2
M4 wiki-lock.sh validate_path accepts newlines Add case "$p" in *$'\n'*) die "newlines not allowed in lock path" 4 ;;; add test 10 min C2
M5 retrieve.py import_sibling no ImportError handling Wrap in try/except (ImportError, SyntaxError); print friendly error pointing to bin/setup-retrieve.sh --check 10 min C2
M6 contextual-prefix.py empty body silent Emit log(f"WARN: {page_path} has no body content; skipping") and return cleanly 5 min C2
M7 rerank.py save_cache() blocking fcntl on non-flock FS Add LOCK_NB + retry loop (3 attempts, 100ms sleep); fall back to no-cache write with a WARN 15 min C2
M8 test_retrieve.py missing --explain and --no-rerank coverage Add 2 test cases asserting the JSON shape changes 15 min C3
M9 Bounded-slices: 4 skills touched by both §3.2 and §3.4 Process note, not a code fix; document in audit §10.3 as PROCESS-ACK (audit-only)
M10 No verifier agents during v1.7 dev Closed by H4 (3ea443f); mark in audit (audit-only)
M11 Synonym category benchmark tied (60% both pipelines) Investigate via benchmark-runner.py --limit 0 --json results.json then per-query analysis; either tune rerank threshold or document why parity is acceptable 30 min C4
M12 Negative-query precision tied at 40% Investigate similarly; tune rerank to suppress sub-threshold top results 20 min C4
M13 NotebookLM derivative outputs gap Defer to v2.0; document in audit §10.5 with explicit roadmap rationale (audit-only)
M14 (verify what this is — read audit §8.3 line for M14) TBD per content TBD TBD

Commit clusters:

  • C1 — non-ASCII tokenization (M2)
  • C2 — defensive-input fixes bundle (M3, M4, M5, M6, M7)
  • C3 — test coverage extension (M8)
  • C4 — benchmark tunings (M11, M12)

After each cluster: make test + verifier dispatch on staged diff (eat own dogfood per the new agent).

Acceptance gate: all 14 MEDIUM closed (with commit SHA in audit §8.3) or deferred (with rationale).


6. Phase 4 — Close the 10 LOW findings (30-45 min)

L1L10 from audit §8.4. Bundle in single commit polish(v1.7.2): close 10 LOW findings from v1.7.0 audit.

Steps:

  1. Read audit §8.4 for the actual L1L10 list (don't list them speculatively here)
  2. For each: tiny edit + one-line CHANGELOG bullet
  3. Single commit covers all 10 + CHANGELOG update

Acceptance gate: all 10 LOW marked CLOSED in audit.


7. Phase 5 — Documentation refresh + final benchmark (30 min)

Steps:

  1. Run python3 scripts/benchmark-runner.py --json /tmp/v172-bench.json on full 50-query corpus (no --limit)
  2. Compare to v1.7.0 audit's numbers (54.0% v17 top-1, +39.5% error reduction). Re-tunings in Phase 3 C4 may have shifted these
  3. Update audit §6.2 with current numbers + delta-from-baseline
  4. Cross-check every commit SHA referenced in audit + CHANGELOG against git log. Any drift = correct
  5. Refresh wiki/hot.md with v1.7.2 state (will auto-commit by hook design)
  6. Bump .claude-plugin/plugin.json + .claude-plugin/marketplace.json from 1.7.1 to 1.7.2 if any of Phases 24 landed code changes; don't bump if only docs + audit changes
  7. Add CHANGELOG [1.7.2] entry referencing this plan as the source

Acceptance gate: every published number is the result of a fresh measurement, not a copy from earlier.


8. Phase 6 — Final verification (30 min) + ship gate

The ship gate is binary: pass or accept the achieved score, no third try.

Steps:

  1. Dispatch verifier agent against entire main..HEAD diff (will be ~50 files at this point)
  2. Run the chair adversarial probe — exactly 10 specific tests:
    1. git check-ignore on every file the codebase might write to
    2. bash -u on every shell script that uses ${VAR} references
    3. python3 -c "import json; json.load(open(f))" on every JSON file
    4. yaml.safe_load on every markdown frontmatter
    5. make test 7-suite re-run
    6. python3 scripts/benchmark-runner.py --limit 5 to verify benchmark harness still runs
    7. bash bin/setup-retrieve.sh --check to verify diagnostic path
    8. git diff --shortstat main..HEAD — confirm acceptance criterion #3
    9. grep -c "TODO\|FIXME\|XXX" on every file changed in main..HEAD — must be 0 net additions
    10. Open every doc file changed, verify each commit-SHA reference resolves via git rev-parse
  3. Compute final score on the 7 dimensions used throughout this session

Ship gate decision:

Outcome Action
Verifier 0/0/0/0 + chair 0 functional findings + acceptance criteria 17 all met SHIP at 100/100. Surface to user for push authorization.
Either pass finds <5 items, all closeable in <30 min One MORE iteration allowed. Close, re-verify, ship.
Either pass finds ≥5 items OR any item requires >30 min Document remaining. Ship at honest achieved score. Add a v1.7.x backlog entry.

Hard rule: maximum 2 verify-fix rounds after Phase 6. The 3-round recursion of the v1.7.1 cycle taught us that adversarial scrutiny is asymptotic. After 2 more rounds, accept the score.


8b. Phase 7 — v1.8.0 methodology modes (6-7h)

After Phase 6 lands v1.7.2 at honest 100/100, build methodology modes — the compass artifact's priority gap 5. Closes axis "methodology support" in audit §9 from TIE to YES (5/7 axes #1).

Deliverables:

  1. New skill skills/wiki-mode/SKILL.md (~45 min)

    • Triggers: "set vault mode", "switch to PARA", "use LYT", "what's my vault mode", "zettelkasten setup"
    • Reads .vault-meta/mode.json; falls back to mode=generic (v1.6/v1.7 default) when absent
    • allowed-tools: Read, Write, Bash
  2. Mode config schema .vault-meta/mode.json (~30 min — schema + write path)

    {
      "schema_version": 1,
      "mode": "lyt|para|zettelkasten|generic",
      "configured_at": "ISO-8601",
      "config": {
        "lyt": {"moc_folder": "wiki/mocs/"},
        "para": {"projects_folder": "wiki/projects/", "areas_folder": "wiki/areas/",
                 "resources_folder": "wiki/resources/", "archives_folder": "wiki/archives/"},
        "zettelkasten": {"id_format": "YYYYMMDDHHMMSS", "no_folders": true}
      }
    }
    
  3. Per-mode templates skills/wiki-mode/templates/ (~60 min)

    • lyt/moc-template.md (Map of Content scaffolding with wikilink-cluster sections)
    • lyt/atomic-template.md (atomic note that links into MOCs)
    • para/project-template.md (active project with status, deadline, next-action)
    • para/area-template.md (ongoing responsibility, no deadline)
    • para/resource-template.md (reference material, topic-organized)
    • zettel/atomic-template.md (atomic claim + supporting sources + parent/child IDs)
    • zettel/_id-format.md (timestamp-based ID generation recipe)
  4. Skill mode-awareness modifications (~90 min)

    • skills/wiki-ingest/SKILL.md — consult .vault-meta/mode.json; route source/entity/concept pages to mode-specific folders when mode != generic
    • skills/save/SKILL.md — same; session notes route to PARA/projects or LYT/MOCs based on mode
    • skills/autoresearch/SKILL.md — same; research artifacts route appropriately
    • All changes preserve v1.7 fallback behavior when mode = generic
  5. Hermetic tests tests/test_wiki_mode.sh + tests/test_wiki_mode.py (~60 min)

    • Mode config writes correctly under each of 4 modes
    • Mode loader returns correct config for each mode
    • Routing logic produces correct path for each (mode, content-type) pair
    • mode=generic preserves v1.7 routing
    • Invalid mode in mode.json triggers explicit error, not silent fallback
    • All hermetic; no network, no LLM, no ollama
  6. Opt-in setup script bin/setup-mode.sh (~30 min)

    • Interactive: prompts user to pick mode
    • Writes .vault-meta/mode.json
    • Optionally seeds template folders (LYT mocs/, PARA projects+areas+resources+archives/)
    • Idempotent; safe to re-run
  7. Documentation (~45 min)

    • docs/methodology-modes-guide.md — explains each mode, when to use, migration paths
    • CLAUDE.md "How to Use" section + new "Methodology Modes (v1.8+)" subsection
    • wiki/references/methodology-modes.md — short decision tree (which mode for which user)
  8. Cross-cutting (~30 min)

    • Makefiletest-mode, setup-mode targets; extend test to include test-mode
    • .claude-plugin/{plugin,marketplace}.json — version 1.7.2 → 1.8.0, description updated
    • .gitignore.vault-meta/mode.json is host-specific runtime config, MUST be ignored
    • CHANGELOG.md — new [1.8.0] entry
    • agents/wiki-ingest.md — note mode-awareness in sub-agent protocol
    • wiki/hot.md — refresh state

Commit ladder (estimated):

  • feat(v1.8.0): wiki-mode skill + 4 mode templates
  • feat(v1.8.0): mode-aware routing in wiki-ingest
  • feat(v1.8.0): mode-aware routing in save + autoresearch
  • test(v1.8.0): hermetic wiki-mode test suite
  • feat(v1.8.0): bin/setup-mode.sh opt-in bootstrap
  • docs(v1.8.0): methodology modes guide + CLAUDE.md update
  • chore(v1.8.0): version bump 1.7.2 → 1.8.0, CHANGELOG, gitignore

Per-commit gates:

  • make test green (now 8 suites including test-mode)
  • Verifier dispatch on staged diff returns ≤1 LOW (eat own dogfood per agents/verifier.md)
  • mode=generic path preserves v1.7 behavior exactly (regression test)

8c. Phase 8 — v1.8.0 ship gate (30 min)

Mirror Phase 6 structure for the v1.8.0 slice. Verifier on entire diff main..HEAD. Chair adversarial probe extended with mode-specific tests:

  • Each mode (LYT, PARA, Zettel) can be set + read back
  • mode=generic routing matches v1.7 routing byte-for-byte on a sample ingest
  • .vault-meta/mode.json is gitignored (test by creating + check-ignore)
  • Setup-mode.sh idempotent (run twice, second run no-op)

Same 2-round cap. If 0/0/0/0 + chair clean: 100/100 SHIP. Else: honest achieved score + v1.8.x backlog.

9. What this plan deliberately does NOT do (scope guard)

These are NOT in scope because they expand into a different release line:

  • v1.9 multimodal ingest (YouTube / PDF / EPUB / image OCR)
  • v2.0 derive (audio / quiz / flashcards / study guide — NotebookLM-class outputs)
  • v2.5+ GUI onramp (Community Plugin fork)
  • Cross-platform (macOS / Windows) testing — explicit out-of-scope per v1.7.0 audit §3
  • Performance benchmarking beyond retrieval accuracy
  • Security audit of dependencies (Python stdlib only; no third-party packages introduced)
  • Marketing / positioning work

A 100/100 on the v1.7 line does NOT mean #1 in the market. Per v1.7.0 audit §9: market-#1 across all 7 axes requires v1.8 + v2.0 + v2.5 work, not patch work. This plan brings the v1.7 line to honest code-quality 100/100. That's the prerequisite for the next release lines, not a substitute for them.


10. Undo plan (per /best-practices "failure is the spec")

If anything in Phases 24 causes a regression that isn't caught by the per-commit make test gate:

  • Revert the specific commit with git revert <sha>; do NOT rebase
  • Re-run verifier on the revert
  • Document the regression in audit §8 as a "FOUND-AND-REVERTED" finding so the lesson sticks

If the entire plan cannot reach the acceptance criteria within 6 hours (1h over budget):

  • Stop
  • Document the gap explicitly
  • Ship at the achieved honest score
  • Add a v1.7.3 backlog entry for the remaining items

The plan is non-mutating to the v1.7 features themselves; only adds prunings (Phase 2) and bug-class fixes (Phase 3). v1.7.1 functional surface is preserved.


11. Per-phase ship gates (mini-acceptance criteria)

Phase Acceptance gate
0 All 24 findings categorized in scratch file
1 agents/verifier.md parses; 2 new "always check" items added
2 Net LOC delta meets §1 criterion #3 OR documented as irreducible
3 All 14 MEDIUM closed-or-deferred per §1 criterion #4
4 All 10 LOW closed
5 Fresh benchmark numbers in audit; all SHAs verified
6 Verifier + chair both clean (or rounds budget exhausted)

If a phase fails its gate, the plan does NOT proceed to the next phase. The chair stops, documents what's incomplete, and surfaces to the user for a go/no-go decision on continuing.


12. Cost-of-failure honest framing

Worst case: 6 hours spent, achieve only 98/100 (some MEDIUMs prove harder than estimated, +5819 stays additive, etc.).

Best case: 4 hours spent, genuinely achieve 100/100 on the v1.7 line, branch ready to push as v1.7.2.

Median case: 5 hours spent, 99/100, all M closed, 1-2 L deferred with rationale, push ready.

The recursion is the risk. Three rounds were needed to land at 97. Phase 6's hard 2-round cap protects against that recursion eating the entire weekend. If the cap fires, the gap is documented and we ship at honest <100 with a v1.7.3 backlog.


13. Confirmation before execution

Per /best-practices "acceptance criteria written before execution" + the user's repeated "no lies" + "honest score" framing, this plan needs explicit user buy-in on:

  1. Scope: §9 explicitly excludes v1.8 / v2.0 / v2.5 work. Confirm.
  2. Budget: 4-5h estimated, 6h hard cap. Confirm or adjust.
  3. Ship gate posture: 2-round cap on adversarial scrutiny after Phase 6. Confirm or adjust.
  4. No push: branch stays local until user authorizes push, even if 100/100 is achieved. Confirm.

If any of these need adjustment, surface that. Otherwise: execute top to bottom.