21 KiB
v1.7.2 + v1.8.0 "Best Ever Per Priority Research" Plan
Date: 2026-05-17
Branch: continue on v1.7.0-compound-vault (still local-only)
Goal: close every honest deduction (v1.7.2 polish) AND add methodology modes (v1.8.0 — compass artifact priority gap 5) to land at 5/7 axes #1 per the original research
Estimated effort: 10-12 hours focused work (4-5h v1.7.2 + 6-7h v1.8.0)
Termination conditions:
- v1.7.2 ship gate after Phase 6 (verifier + chair clean OR 2-round cap fired)
- v1.8.0 ship gate after Phase 8 (verifier + chair clean OR 2-round cap fired)
- 14h hard time cap; if v1.7.2 takes >6h, defer v1.8.0 to a separate session
0. Why this plan exists
Three rounds of verifier + chair scrutiny converged on 97/100:
- Round 1 (initial v1.7.1 fixes): chair scored 96, verifier later found 5 polish items
- Round 2 (polish commit): verifier said SHIP 0/0/0/0; chair found 2 items, then 3 more on harder probe
- Round 3 (chair-probe fixes): verifier said SHIP 1 LOW; chair fixed inline
After Round 3 the remaining deductions are structural, not surface-level:
| Honest deduction | What it really is |
|---|---|
| Defect introduction: 100 | (now clean after Round 3) |
| Internal consistency: 100 | (now clean after Round 3) |
| /best-practices kernel: 88 | Two structural issues that polish cannot lift |
| Net session score | 97/100 |
The 88 on kernel has two specific causes:
+5819 / -30 LOCacross 41 files sincemain. The kernel says "delete more than you add"; this is the opposite.- Three rounds were needed to converge. A kernel-disciplined slice would land in one pass.
Plus the broader repo still has:
- 14 MEDIUM findings open from the v1.7.0 audit
- 10 LOW findings open from the v1.7.0 audit
A genuine 100/100 requires all of these closed or explicitly deferred with rationale. This plan does that.
1. Acceptance criteria (defined BEFORE execution, per /best-practices "failure is the spec")
For this plan to count as "achieved 100/100":
- Final verifier dispatch returns 0 BLOCKER / 0 HIGH / 0 MEDIUM / 0 LOW on the entire
main..HEADdiff - Final chair adversarial probe (≥10 specific tests, listed in §7) returns 0 functional findings
- Net LOC delta
main..HEADshows non-trivial deletion: net additions ≤ +5000 OR deletions ≥ 200 LOC (whichever fires; both are honest measures of having pruned something) - Every M1–M14 + L1–L10 is either CLOSED (with commit SHA in audit doc) or DEFERRED (with one-line milestone + rationale in audit doc). No silent omissions.
make teststays green throughout- Branch remains local until explicit user push authorization
agents/verifier.mdupdated with the git-hygiene cut + any other self-improvement that emerged from the three-round retrospective
If any of these 7 cannot be met, the plan SHIPS at the achieved score with the gap explicitly documented. No silent shortfalls.
2. Phase 0 — Audit refresh (15 min)
Goal: know exactly what's open before touching code.
Steps:
- Re-read docs/audits/v1.7.0-audit-2026-05-17.md §8.1–§8.4 (BLOCKER/HIGH/MEDIUM/LOW ledgers) in full
- For each finding M1–M14 + L1–L10, categorize:
- SHIPPED-in-v1.7.1: already closed by an existing commit (mark in audit)
- CLOSEABLE-this-session: small, focused, no scope creep (target for Phase 3)
- DEFER-with-rationale: legitimately bigger or roadmap-tied (target for §6 audit update)
- Write the categorization to a working scratch file at
docs/audits/v1.7.2-coverage-matrix.md(deleted at end of plan; intermediate artifact)
Output: categorization for all 24 open findings. No code changes.
3. Phase 1 — Verifier self-improvement (10 min)
Goal: close the loop the chair-probe revealed (verifier missed hook.log not in gitignore).
Steps:
- Add to
agents/verifier.md"Specifically check for in EVERY workstream" section, after item 4:5. **Git hygiene** — any new file path written by code in this diff (open files, log writes, cache writes, temp files) that is NOT already in `.gitignore` → HIGH. The PostToolUse auto-commit hook stages everything under wiki/, .raw/, .vault-meta/; an unignored runtime artifact creates a self-pollution loop on the next hook fire. 6. **Additive-without-pruning** — if `git diff --shortstat main..HEAD` shows net additions > +500 LOC and deletions < 50 LOC, flag as MEDIUM. Real feature work adds lines; pure additive cycles with no pruning suggest v_prev cruft is being retained reflexively. - Verify YAML frontmatter still parses (
python3 -c "import yaml; yaml.safe_load(open('agents/verifier.md').read().split('---')[1])") - Commit:
docs(v1.7.2): verifier-agent self-improvement from 3-round retrospective
Output: verifier.md has two new "always check" items; next dispatch catches what this session's verifier missed.
4. Phase 2 — Close the +5819 / -30 LOC ratio (60-90 min)
Goal: prune v1.6 code that v1.7 superseded but didn't remove.
Steps:
- Inventory candidates:
# Comments referencing pre-v1.7 behavior in skills/ grep -rn "v1\.6\|legacy\|deprecated\|TODO\|FIXME" skills/ scripts/ bin/ # Skill sections with "## v1.6 behavior" / "## Before v1.7" headers grep -rn "^## .*1\.6\|^### .*1\.6" skills/ # Tool references in skills that v1.7 transport supersedes grep -rn "allowed-tools: .*Edit\|allowed-tools: .*Write" skills/ - For each candidate, decide:
- PRUNE: code or doc that is dead post-v1.7 (e.g., a "v1.6 fallback" path that the v1.7 transport layer makes unreachable, a legacy comment block superseded by
compound-vault-guide.md) - KEEP: legitimately current code or doc; add a one-line justification in the working scratch file
- PRUNE: code or doc that is dead post-v1.7 (e.g., a "v1.6 fallback" path that the v1.7 transport layer makes unreachable, a legacy comment block superseded by
- Apply prunes in clusters (one commit per logical theme, e.g. "prune v1.6 transport assumptions", "prune superseded inline docs")
- After each prune commit,
make testmust stay green - Acceptance gate: end-of-Phase-2
git diff --shortstat main..HEADshows either net additions ≤ +5000 LOC or deletions ≥ 200 LOC
Failure mode: if v1.7 genuinely added only new features with zero v1.6 supersession, the +5819 stays as additive and the kernel deduction is irreducible. In that case, DOCUMENT it explicitly in audit §10.4 ("+5819 / -30 is the honest cost of building a substrate; v1.6 had no deprecation surface") and accept the score adjustment. Do not invent prunes to game the metric.
Output: N prune commits + scratch file justifying every retained piece of v1.6 code.
5. Phase 3 — Close the 14 MEDIUM findings (90-120 min)
Walk each finding from the v1.7.0 audit §8.3. Group related fixes; one commit per cluster.
| # | Finding | Plan | Effort | Commit grouping |
|---|---|---|---|---|
| M1 | §3.2 +485/-0 LOC | Addressed in Phase 2 | — | (in Phase 2) |
| M2 | bm25-index.py non-ASCII tokenization drops content |
Extend regex [A-Za-z][A-Za-z0-9'\-]* to [\w'\-]+ with re.UNICODE; add hermetic test with emoji + CJK + Cyrillic + Spanish accented input; verify BM25 ranking changes are sensible |
20 min | C1 |
| M3 | rerank.py --allow-remote-ollama error blames user |
Improve error: "OLLAMA_URL points off-localhost; either run ollama locally or pass --allow-remote-ollama through retrieve.py (which forwards it here)" | 5 min | C2 |
| M4 | wiki-lock.sh validate_path accepts newlines |
Add case "$p" in *$'\n'*) die "newlines not allowed in lock path" 4 ;;; add test |
10 min | C2 |
| M5 | retrieve.py import_sibling no ImportError handling |
Wrap in try/except (ImportError, SyntaxError); print friendly error pointing to bin/setup-retrieve.sh --check |
10 min | C2 |
| M6 | contextual-prefix.py empty body silent |
Emit log(f"WARN: {page_path} has no body content; skipping") and return cleanly |
5 min | C2 |
| M7 | rerank.py save_cache() blocking fcntl on non-flock FS |
Add LOCK_NB + retry loop (3 attempts, 100ms sleep); fall back to no-cache write with a WARN |
15 min | C2 |
| M8 | test_retrieve.py missing --explain and --no-rerank coverage |
Add 2 test cases asserting the JSON shape changes | 15 min | C3 |
| M9 | Bounded-slices: 4 skills touched by both §3.2 and §3.4 | Process note, not a code fix; document in audit §10.3 as PROCESS-ACK | — | (audit-only) |
| M10 | No verifier agents during v1.7 dev | Closed by H4 (3ea443f); mark in audit | — | (audit-only) |
| M11 | Synonym category benchmark tied (60% both pipelines) | Investigate via benchmark-runner.py --limit 0 --json results.json then per-query analysis; either tune rerank threshold or document why parity is acceptable |
30 min | C4 |
| M12 | Negative-query precision tied at 40% | Investigate similarly; tune rerank to suppress sub-threshold top results | 20 min | C4 |
| M13 | NotebookLM derivative outputs gap | Defer to v2.0; document in audit §10.5 with explicit roadmap rationale | — | (audit-only) |
| M14 | (verify what this is — read audit §8.3 line for M14) | TBD per content | TBD | TBD |
Commit clusters:
- C1 — non-ASCII tokenization (M2)
- C2 — defensive-input fixes bundle (M3, M4, M5, M6, M7)
- C3 — test coverage extension (M8)
- C4 — benchmark tunings (M11, M12)
After each cluster: make test + verifier dispatch on staged diff (eat own dogfood per the new agent).
Acceptance gate: all 14 MEDIUM closed (with commit SHA in audit §8.3) or deferred (with rationale).
6. Phase 4 — Close the 10 LOW findings (30-45 min)
L1–L10 from audit §8.4. Bundle in single commit polish(v1.7.2): close 10 LOW findings from v1.7.0 audit.
Steps:
- Read audit §8.4 for the actual L1–L10 list (don't list them speculatively here)
- For each: tiny edit + one-line CHANGELOG bullet
- Single commit covers all 10 + CHANGELOG update
Acceptance gate: all 10 LOW marked CLOSED in audit.
7. Phase 5 — Documentation refresh + final benchmark (30 min)
Steps:
- Run
python3 scripts/benchmark-runner.py --json /tmp/v172-bench.jsonon full 50-query corpus (no--limit) - Compare to v1.7.0 audit's numbers (54.0% v17 top-1, +39.5% error reduction). Re-tunings in Phase 3 C4 may have shifted these
- Update audit §6.2 with current numbers + delta-from-baseline
- Cross-check every commit SHA referenced in audit + CHANGELOG against
git log. Any drift = correct - Refresh
wiki/hot.mdwith v1.7.2 state (will auto-commit by hook design) - Bump
.claude-plugin/plugin.json+.claude-plugin/marketplace.jsonfrom1.7.1to1.7.2if any of Phases 2–4 landed code changes; don't bump if only docs + audit changes - Add CHANGELOG
[1.7.2]entry referencing this plan as the source
Acceptance gate: every published number is the result of a fresh measurement, not a copy from earlier.
8. Phase 6 — Final verification (30 min) + ship gate
The ship gate is binary: pass or accept the achieved score, no third try.
Steps:
- Dispatch verifier agent against entire
main..HEADdiff (will be ~50 files at this point) - Run the chair adversarial probe — exactly 10 specific tests:
git check-ignoreon every file the codebase might write tobash -uon every shell script that uses${VAR}referencespython3 -c "import json; json.load(open(f))"on every JSON fileyaml.safe_loadon every markdown frontmattermake test7-suite re-runpython3 scripts/benchmark-runner.py --limit 5to verify benchmark harness still runsbash bin/setup-retrieve.sh --checkto verify diagnostic pathgit diff --shortstat main..HEAD— confirm acceptance criterion #3grep -c "TODO\|FIXME\|XXX"on every file changed inmain..HEAD— must be 0 net additions- Open every doc file changed, verify each commit-SHA reference resolves via
git rev-parse
- Compute final score on the 7 dimensions used throughout this session
Ship gate decision:
| Outcome | Action |
|---|---|
| Verifier 0/0/0/0 + chair 0 functional findings + acceptance criteria 1–7 all met | SHIP at 100/100. Surface to user for push authorization. |
| Either pass finds <5 items, all closeable in <30 min | One MORE iteration allowed. Close, re-verify, ship. |
| Either pass finds ≥5 items OR any item requires >30 min | Document remaining. Ship at honest achieved score. Add a v1.7.x backlog entry. |
Hard rule: maximum 2 verify-fix rounds after Phase 6. The 3-round recursion of the v1.7.1 cycle taught us that adversarial scrutiny is asymptotic. After 2 more rounds, accept the score.
8b. Phase 7 — v1.8.0 methodology modes (6-7h)
After Phase 6 lands v1.7.2 at honest 100/100, build methodology modes — the compass artifact's priority gap 5. Closes axis "methodology support" in audit §9 from TIE to YES (5/7 axes #1).
Deliverables:
-
New skill
skills/wiki-mode/SKILL.md(~45 min)- Triggers: "set vault mode", "switch to PARA", "use LYT", "what's my vault mode", "zettelkasten setup"
- Reads
.vault-meta/mode.json; falls back tomode=generic(v1.6/v1.7 default) when absent - allowed-tools: Read, Write, Bash
-
Mode config schema
.vault-meta/mode.json(~30 min — schema + write path){ "schema_version": 1, "mode": "lyt|para|zettelkasten|generic", "configured_at": "ISO-8601", "config": { "lyt": {"moc_folder": "wiki/mocs/"}, "para": {"projects_folder": "wiki/projects/", "areas_folder": "wiki/areas/", "resources_folder": "wiki/resources/", "archives_folder": "wiki/archives/"}, "zettelkasten": {"id_format": "YYYYMMDDHHMMSS", "no_folders": true} } } -
Per-mode templates
skills/wiki-mode/templates/(~60 min)lyt/moc-template.md(Map of Content scaffolding with wikilink-cluster sections)lyt/atomic-template.md(atomic note that links into MOCs)para/project-template.md(active project with status, deadline, next-action)para/area-template.md(ongoing responsibility, no deadline)para/resource-template.md(reference material, topic-organized)zettel/atomic-template.md(atomic claim + supporting sources + parent/child IDs)zettel/_id-format.md(timestamp-based ID generation recipe)
-
Skill mode-awareness modifications (~90 min)
skills/wiki-ingest/SKILL.md— consult.vault-meta/mode.json; route source/entity/concept pages to mode-specific folders when mode != genericskills/save/SKILL.md— same; session notes route to PARA/projects or LYT/MOCs based on modeskills/autoresearch/SKILL.md— same; research artifacts route appropriately- All changes preserve v1.7 fallback behavior when mode = generic
-
Hermetic tests
tests/test_wiki_mode.sh+tests/test_wiki_mode.py(~60 min)- Mode config writes correctly under each of 4 modes
- Mode loader returns correct config for each mode
- Routing logic produces correct path for each (mode, content-type) pair
- mode=generic preserves v1.7 routing
- Invalid mode in mode.json triggers explicit error, not silent fallback
- All hermetic; no network, no LLM, no ollama
-
Opt-in setup script
bin/setup-mode.sh(~30 min)- Interactive: prompts user to pick mode
- Writes
.vault-meta/mode.json - Optionally seeds template folders (LYT mocs/, PARA projects+areas+resources+archives/)
- Idempotent; safe to re-run
-
Documentation (~45 min)
docs/methodology-modes-guide.md— explains each mode, when to use, migration pathsCLAUDE.md"How to Use" section + new "Methodology Modes (v1.8+)" subsectionwiki/references/methodology-modes.md— short decision tree (which mode for which user)
-
Cross-cutting (~30 min)
Makefile—test-mode,setup-modetargets; extendtestto includetest-mode.claude-plugin/{plugin,marketplace}.json— version 1.7.2 → 1.8.0, description updated.gitignore—.vault-meta/mode.jsonis host-specific runtime config, MUST be ignoredCHANGELOG.md— new [1.8.0] entryagents/wiki-ingest.md— note mode-awareness in sub-agent protocolwiki/hot.md— refresh state
Commit ladder (estimated):
feat(v1.8.0): wiki-mode skill + 4 mode templatesfeat(v1.8.0): mode-aware routing in wiki-ingestfeat(v1.8.0): mode-aware routing in save + autoresearchtest(v1.8.0): hermetic wiki-mode test suitefeat(v1.8.0): bin/setup-mode.sh opt-in bootstrapdocs(v1.8.0): methodology modes guide + CLAUDE.md updatechore(v1.8.0): version bump 1.7.2 → 1.8.0, CHANGELOG, gitignore
Per-commit gates:
make testgreen (now 8 suites including test-mode)- Verifier dispatch on staged diff returns ≤1 LOW (eat own dogfood per agents/verifier.md)
- mode=generic path preserves v1.7 behavior exactly (regression test)
8c. Phase 8 — v1.8.0 ship gate (30 min)
Mirror Phase 6 structure for the v1.8.0 slice. Verifier on entire diff main..HEAD. Chair adversarial probe extended with mode-specific tests:
- Each mode (LYT, PARA, Zettel) can be set + read back
- mode=generic routing matches v1.7 routing byte-for-byte on a sample ingest
.vault-meta/mode.jsonis gitignored (test by creating + check-ignore)- Setup-mode.sh idempotent (run twice, second run no-op)
Same 2-round cap. If 0/0/0/0 + chair clean: 100/100 SHIP. Else: honest achieved score + v1.8.x backlog.
9. What this plan deliberately does NOT do (scope guard)
These are NOT in scope because they expand into a different release line:
- v1.9 multimodal ingest (YouTube / PDF / EPUB / image OCR)
- v2.0 derive (audio / quiz / flashcards / study guide — NotebookLM-class outputs)
- v2.5+ GUI onramp (Community Plugin fork)
- Cross-platform (macOS / Windows) testing — explicit out-of-scope per v1.7.0 audit §3
- Performance benchmarking beyond retrieval accuracy
- Security audit of dependencies (Python stdlib only; no third-party packages introduced)
- Marketing / positioning work
A 100/100 on the v1.7 line does NOT mean #1 in the market. Per v1.7.0 audit §9: market-#1 across all 7 axes requires v1.8 + v2.0 + v2.5 work, not patch work. This plan brings the v1.7 line to honest code-quality 100/100. That's the prerequisite for the next release lines, not a substitute for them.
10. Undo plan (per /best-practices "failure is the spec")
If anything in Phases 2–4 causes a regression that isn't caught by the per-commit make test gate:
- Revert the specific commit with
git revert <sha>; do NOT rebase - Re-run verifier on the revert
- Document the regression in audit §8 as a "FOUND-AND-REVERTED" finding so the lesson sticks
If the entire plan cannot reach the acceptance criteria within 6 hours (1h over budget):
- Stop
- Document the gap explicitly
- Ship at the achieved honest score
- Add a v1.7.3 backlog entry for the remaining items
The plan is non-mutating to the v1.7 features themselves; only adds prunings (Phase 2) and bug-class fixes (Phase 3). v1.7.1 functional surface is preserved.
11. Per-phase ship gates (mini-acceptance criteria)
| Phase | Acceptance gate |
|---|---|
| 0 | All 24 findings categorized in scratch file |
| 1 | agents/verifier.md parses; 2 new "always check" items added |
| 2 | Net LOC delta meets §1 criterion #3 OR documented as irreducible |
| 3 | All 14 MEDIUM closed-or-deferred per §1 criterion #4 |
| 4 | All 10 LOW closed |
| 5 | Fresh benchmark numbers in audit; all SHAs verified |
| 6 | Verifier + chair both clean (or rounds budget exhausted) |
If a phase fails its gate, the plan does NOT proceed to the next phase. The chair stops, documents what's incomplete, and surfaces to the user for a go/no-go decision on continuing.
12. Cost-of-failure honest framing
Worst case: 6 hours spent, achieve only 98/100 (some MEDIUMs prove harder than estimated, +5819 stays additive, etc.).
Best case: 4 hours spent, genuinely achieve 100/100 on the v1.7 line, branch ready to push as v1.7.2.
Median case: 5 hours spent, 99/100, all M closed, 1-2 L deferred with rationale, push ready.
The recursion is the risk. Three rounds were needed to land at 97. Phase 6's hard 2-round cap protects against that recursion eating the entire weekend. If the cap fires, the gap is documented and we ship at honest <100 with a v1.7.3 backlog.
13. Confirmation before execution
Per /best-practices "acceptance criteria written before execution" + the user's repeated "no lies" + "honest score" framing, this plan needs explicit user buy-in on:
- Scope: §9 explicitly excludes v1.8 / v2.0 / v2.5 work. Confirm.
- Budget: 4-5h estimated, 6h hard cap. Confirm or adjust.
- Ship gate posture: 2-round cap on adversarial scrutiny after Phase 6. Confirm or adjust.
- No push: branch stays local until user authorizes push, even if 100/100 is achieved. Confirm.
If any of these need adjustment, surface that. Otherwise: execute top to bottom.