23 KiB
v1.7.1 Fixes Plan — Post-Audit Punch List
Status: READY TO EXECUTE
Date: 2026-05-17
Source: v1.7.0 audit §10.1 + §10.2
Branch: continue on v1.7.0-compound-vault (local-only; no push)
Goal: close the 1 BLOCKER + 6 HIGH findings; tag/push v1.7.1 (not v1.7.0) when done
Estimated effort: ~2.5 hours of focused work, 7 small commits
Pre-flight (read once before starting)
- Working tree state on resume:
- Untracked:
docs/audits/v1.7.0-audit-2026-05-17.md,docs/audits/v1.7.1-fixes-plan.md(this file),scripts/baseline-v16.py,scripts/benchmark-runner.py - Auto-committed during audit:
96a5505 wiki: auto-commit 2026-05-17 03:17(the retrieval benchmark corpus atwiki/meta/retrieval-benchmark-v1.7.md) - Branch head:
4a362ed(last v1.7 feat code commit) — wait, actually the auto-commit is on top, so head =96a5505 make testis green (7 suites, ~1162 assertions)bash bin/setup-retrieve.sh --no-llmis already provisioned (.vault-meta/chunks/,.vault-meta/bm25/,.vault-meta/embed-cache.jsonexist locally; gitignored)
- Untracked:
- The PostToolUse hook will auto-commit any change under
wiki/. Soft-reset after if you want the fix grouped with a chore commit. - After every fix: run
make test. If green, commit. If red, do not commit until green.
Fix 1 (BLOCKER) — contextual-prefix.py data-egress consent gap
Severity: BLOCKER (must fix before any public push)
Source: audit §3.2 commit 45a5bd3 cut #6; cross-confirmed by Agent 1 + Agent 3
File: scripts/contextual-prefix.py
Precedent to mirror: scripts/tiling-check.py:351-352 --allow-remote-ollama flag + localhost guard
What's broken
pick_prefix_tier() at line 252-258 reads:
def pick_prefix_tier(force_synthetic):
if force_synthetic:
return "synthetic"
if os.environ.get("ANTHROPIC_API_KEY"):
return "anthropic-api" # ← silently sends page bodies off-machine
if shutil.which("claude"):
return "claude-cli" # ← also off-machine, via subprocess
return "synthetic"
A user with ANTHROPIC_API_KEY in their env (very common — CC users) runs bash bin/setup-retrieve.sh and gets their wiki page bodies streamed to https://api.anthropic.com/v1/messages with no prompt, no log warning, no opt-in. The existing precedent (tiling-check.py's --allow-remote-ollama) shows this codebase already has a pattern for explicit consent on data egress — contextual-prefix.py just didn't follow it.
Changes
scripts/contextual-prefix.py— add--allow-egressCLI flag, defaultFalse. Without the flag,pick_prefix_tier()returns"synthetic"regardless of env vars orclaudebinary. Update the help text in the docstring (lines 11-19) to document the flag + the default.
# in main(), after the existing flags:
parser.add_argument("--allow-egress", action="store_true",
help="Allow tier-1 (Anthropic API) or tier-2 (claude CLI subprocess) "
"prefix generation. Without this flag, page bodies stay on-machine "
"and only the tier-3 synthetic prefix is used. Mirror of "
"tiling-check.py's --allow-remote-ollama guard.")
# in pick_prefix_tier(), add allow_egress parameter and guard early:
def pick_prefix_tier(force_synthetic, allow_egress=False):
if force_synthetic or not allow_egress:
return "synthetic"
if os.environ.get("ANTHROPIC_API_KEY"):
return "anthropic-api"
if shutil.which("claude"):
return "claude-cli"
return "synthetic"
# at the process_page call site, pass through:
tier = pick_prefix_tier(force_synthetic, allow_egress=args.allow_egress)
bin/setup-retrieve.sh— when invoked WITHOUT--no-llm, prompt the user. Add right after the prefix-tier-picker block (around line 100):
if ! $NO_LLM && ! $CHECK_ONLY; then
case "$PREFIX_TIER" in
*anthropic-api*|*claude-cli*)
say ""
say "⚠️ Stage 1 will send page bodies off-machine via the '$PREFIX_TIER' tier."
say " Estimated egress: ~\$0 (claude-cli, free) to ~\$12 per 1,000 pages (Anthropic API)."
say " Per-page bodies are POSTed to the LLM; check the Anthropic privacy policy."
printf " Continue? [y/N]: "
read -r reply
case "$reply" in
[yY]|[yY][eE][sS]) say "Proceeding with egress." ;;
*) say "Aborted. Re-run with --no-llm for the synthetic-only path." ; exit 0 ;;
esac
;;
esac
fi
# Also: when contextual-prefix.py is invoked, pass --allow-egress through:
ARGS=("--all")
$NO_LLM && ARGS+=("--no-llm")
! $NO_LLM && ARGS+=("--allow-egress") # ← new line
$REBUILD && ARGS+=("--rebuild")
skills/wiki-retrieve/SKILL.md— add a Data Privacy callout near the top, right after the substrate note (bundle with H6 fix below):
## Data privacy (v1.7.1+)
Tier 1 (Anthropic API) and tier 2 (claude CLI subprocess) of the contextual-prefix
generator send wiki page bodies off-machine. By default, both tiers are GUARDED behind:
- `scripts/contextual-prefix.py --allow-egress` flag (default off → falls through to tier 3)
- `bin/setup-retrieve.sh` consent prompt before any non-synthetic Stage 1 run
To run fully on-machine (tier 3 synthetic prefix + local ollama rerank), use:
`bash bin/setup-retrieve.sh --no-llm`. This is the default if you do not pass --allow-egress.
The egress guard mirrors `scripts/tiling-check.py:351-352`'s `--allow-remote-ollama` precedent.
Verification
# 1. Existing tests must still pass
make test # 7 suites green
# 2. New behavior: without --allow-egress, tier defaults to synthetic
python3 scripts/contextual-prefix.py wiki/concepts/Hot\ Cache.md --peek
# Expect log line: "tier=synthetic"
# 3. With --allow-egress AND key set, tier picks api/cli
ANTHROPIC_API_KEY=test python3 scripts/contextual-prefix.py wiki/concepts/Hot\ Cache.md --peek --allow-egress
# Expect log line: "tier=anthropic-api"
# 4. setup-retrieve.sh consent prompt fires when non-synthetic
echo "" | bash bin/setup-retrieve.sh # press enter at the prompt
# Expect: prompt appears, defaults to "no", script exits cleanly with "Aborted."
Commit message
fix(v1.7.1): contextual-prefix egress requires explicit consent
Close the v1.7.0 audit BLOCKER (docs/audits/v1.7.0-audit-2026-05-17.md §3.2,
§8.1 B1). pick_prefix_tier() was selecting tier-1 (Anthropic API) whenever
ANTHROPIC_API_KEY was set in env — no flag, no prompt, no warning. Wiki page
bodies streamed off-machine without user opt-in.
Mirror the existing precedent at scripts/tiling-check.py:351-352
(--allow-remote-ollama + localhost-only default):
- scripts/contextual-prefix.py: new --allow-egress flag (default off).
pick_prefix_tier() now requires allow_egress=True to pick non-synthetic tiers.
- bin/setup-retrieve.sh: prompts before any non-synthetic Stage 1 run; defaults
to abort on empty/no input; passes --allow-egress through to the helper.
- skills/wiki-retrieve/SKILL.md: Data Privacy callout at top of skill body.
v1.6 vaults that never ran setup-retrieve.sh see no behavior change. v1.7
adopters who DID run it will be prompted on their next refresh.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix 2 (HIGH H1) — bin/setup-retrieve.sh no rollback if Stage 1 fails partway
Severity: HIGH
File: bin/setup-retrieve.sh
What's broken
bin/setup-retrieve.sh:128-140 runs python3 scripts/contextual-prefix.py --all and continues to Stage 2 regardless of return code. If Stage 1 fails on chunk 31 of 47, partial .vault-meta/chunks/ is left, Stage 2 builds a stale index, the user has no documented recovery path.
Changes
Add explicit exit-code check + recovery hint:
# Replace the existing Stage 1 invocation block with:
say ""
say "═══ Stage 1/2: chunking + contextual-prefix generation ═══"
ARGS=("--all")
$NO_LLM && ARGS+=("--no-llm")
! $NO_LLM && ARGS+=("--allow-egress") # from Fix 1
$REBUILD && ARGS+=("--rebuild")
if ! python3 "$VAULT/scripts/contextual-prefix.py" "${ARGS[@]}"; then
STAGE1_RC=$?
warn "Stage 1 failed (rc=$STAGE1_RC). Partial chunks are at:"
warn " $META/chunks/"
warn "Recovery options:"
warn " 1. Re-run setup-retrieve.sh — incremental skip will resume from body_hash"
warn " 2. Wipe and start over: rm -rf $META/chunks/ && bash bin/setup-retrieve.sh"
warn " 3. Provision the failed page only: python3 scripts/contextual-prefix.py wiki/<failing-page>.md --rebuild"
exit 5
fi
Verification
# Force a failure: invalid path that the script will try to read
echo "" | python3 scripts/contextual-prefix.py /nonexistent.md
# Then run setup, expect the failure-mode warning
bash bin/setup-retrieve.sh --no-llm
# Hand-verify recovery message appears with the three options
Commit message
fix(v1.7.1): setup-retrieve rollback path on Stage 1 failure
Close audit H1. bin/setup-retrieve.sh ignored Stage 1's exit code and
proceeded to Stage 2 with partial chunks. Now exits 5 on Stage 1 failure
with a 3-option recovery hint (incremental resume, full wipe, single-page
re-process).
Fix 3 (HIGH H2) — make clean-test-state doesn't remove v1.7 artifacts
Severity: HIGH (hygiene; test artifacts persist)
File: Makefile
What's broken
Makefile:55-61 clean-test-state target removes v1.6 lockfiles + tiling cache + bm25 lock + embed-cache, but does NOT remove the v1.7 artifacts that bin/setup-retrieve.sh provisions: .vault-meta/chunks/, .vault-meta/bm25/, .vault-meta/locks/, .vault-meta/transport.json. The gitignore (commit 753fc8a) correctly lists these as runtime artifacts; the Makefile target should match.
Changes
clean-test-state:
@rm -f .vault-meta/.address.lock .vault-meta/.tiling.lock .vault-meta/.bm25.lock \
.vault-meta/.embed-cache.lock .vault-meta/.wiki-lock.meta \
.vault-meta/tiling-cache.json \
.vault-meta/tiling-cache.*.tmp .vault-meta/embed-cache.json \
.vault-meta/embed-cache.*.tmp .vault-meta/transport.json \
.vault-meta/transport.*.tmp
@rm -rf .vault-meta/chunks/ .vault-meta/bm25/ .vault-meta/locks/
@echo "Runtime lockfiles, caches, and v1.7 retrieval/lock artifacts removed."
Verification
bash bin/setup-retrieve.sh --no-llm # provision artifacts
test -d .vault-meta/chunks && echo "chunks exist" # should print
test -f .vault-meta/bm25/index.json && echo "bm25 exists" # should print
make clean-test-state # remove
test -d .vault-meta/chunks || echo "chunks gone" # should print
test -f .vault-meta/bm25/index.json || echo "bm25 gone" # should print
make test # still green
Commit message
fix(v1.7.1): clean-test-state removes v1.7 retrieval artifacts
Close audit H2. The gitignore (753fc8a) listed chunks/, bm25/, locks/,
transport.json as regenerable artifacts but make clean-test-state only
removed v1.6 caches. Extend to match the gitignore set.
Fix 4 (HIGH H3) — PostToolUse hook || true swallows lock-check errors
Severity: HIGH (safety property silently degrades on any check failure)
File: hooks/hooks.json
What's broken
hooks/hooks.json:34-37 PostToolUse command is one long pipeline ending in || true. If bash scripts/wiki-lock.sh list errors (permission denied on .vault-meta/.wiki-lock.meta, missing script, etc.), the ||true swallows it and git add/commit still fires. The intended "defer commit while locks held" property silently degrades to "always commit."
Changes
Restructure to test the lock-list exit code explicitly:
{
"matcher": "Write|Edit",
"hooks": [
{
"type": "command",
"command": "[ -d .git ] || exit 0; if [ -x scripts/wiki-lock.sh ]; then LOCK_COUNT=$(bash scripts/wiki-lock.sh list 2>/dev/null | wc -l | tr -d ' '); LOCK_RC=$?; if [ \"$LOCK_RC\" != \"0\" ]; then exit 0; fi; if [ \"$LOCK_COUNT\" != \"0\" ]; then exit 0; fi; fi; git add wiki/ .raw/ .vault-meta/ 2>/dev/null && (git diff --cached --quiet || git commit -m \"wiki: auto-commit $(date '+%Y-%m-%d %H:%M')\" 2>/dev/null) || true"
}
]
}
Key changes:
[ -d .git ] || exit 0— early bail if no gitLOCK_RC=$?captures the wiki-lock script's exit code separately- If the lock check errored, EXIT 0 (defer commit; do not auto-commit on unknown state)
- The final
|| truenow only covers the git commands themselves
Verification
# Manual: temporarily chmod -x scripts/wiki-lock.sh; trigger a Write tool;
# verify the hook does NOT auto-commit (the `[ -x ... ]` test fails harmlessly)
# Restore: chmod +x scripts/wiki-lock.sh
# Manual: hold a lock, trigger a Write, verify NO commit fires
bash scripts/wiki-lock.sh acquire wiki/concepts/Test.md
# (then trigger a Write tool via Claude — verify no "wiki: auto-commit" in git log)
bash scripts/wiki-lock.sh release wiki/concepts/Test.md
Commit message
fix(v1.7.1): PostToolUse hook tests lock-check exit code explicitly
Close audit H3. The hook's || true terminator swallowed errors from
`wiki-lock list`. If the lock check failed for any reason (permission,
missing script, deleted meta-lock), the safety property silently degraded
to "always commit." Restructure: capture LOCK_RC separately; on non-zero,
exit 0 (defer commit on unknown state) instead of falling through.
Fix 5 (HIGH H5) — detect-transport.sh JSON escaping
Severity: HIGH (theoretical; obsidian-cli unlikely to emit malicious version strings, but defense in depth)
File: scripts/detect-transport.sh
What's broken
scripts/detect-transport.sh:79,86 substitutes obsidian-cli --version output directly into the JSON via shell variable expansion. Only tr -d '"' is applied. Newlines, backslashes, control chars would break the JSON.
Changes
Wrap the version capture in a Python json-escape pass. Add this helper at the top of the script (after the var defs):
json_escape() {
python3 -c 'import json,sys; print(json.dumps(sys.stdin.read().strip()), end="")'
}
Then change lines 79 and 86 from:
CLI_VERSION="$(obsidian-cli --version 2>/dev/null | head -1 | tr -d '"' || echo unknown)"
to:
CLI_VERSION="$(obsidian-cli --version 2>/dev/null | head -1 | json_escape || echo '"unknown"')"
Note: json_escape outputs an already-quoted JSON string, so the snapshot() heredoc needs to drop the quotes around ${CLI_VERSION}:
# in snapshot(), change:
"version_string": "${CLI_VERSION}",
# to:
"version_string": ${CLI_VERSION},
Verification
bash scripts/detect-transport.sh --peek --quiet | python3 -c 'import json,sys; json.load(sys.stdin); print("OK")'
# Expect "OK"
# Synthetic test: rig the function to receive a JSON-breaking value
echo 'Obsidian "1.12"' | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read().strip()))'
# Expect: "Obsidian \"1.12\""
Commit message
fix(v1.7.1): detect-transport.sh escapes version_string as JSON
Close audit H5. CLI_VERSION came from `obsidian-cli --version` and was
embedded in JSON via shell variable expansion with only `tr -d '"'`
applied. Backslashes, newlines, control chars would break the output.
Pipe through `python3 json.dumps` (json_escape() helper).
Fix 6 (HIGH H6) — skills/wiki-retrieve/SKILL.md no Data Privacy callout
Severity: HIGH (transparency; documents the egress posture)
File: skills/wiki-retrieve/SKILL.md
What's broken
The skill frontmatter description doesn't mention that tier-1/tier-2 contextual prefix sends page bodies off-machine. The architecture section implies it; the user-facing description doesn't surface it.
Changes
This fix is BUNDLED with Fix 1 — the Data Privacy callout in §"Data privacy (v1.7.1+)" added by Fix 1 closes this. No separate commit needed; the Fix 1 commit message references "closes H6" already.
Alternatively, if the bundled callout from Fix 1 wasn't sufficient, expand the frontmatter description to add a privacy phrase:
description: "Hybrid retrieval primitive ... [existing text] ... Triggers on: retrieve, hybrid retrieval, BM25, rerank, contextual retrieval, search the chunks, ... DATA PRIVACY: tier-1/tier-2 contextual prefix sends page bodies to LLM (Anthropic API / claude CLI) and requires --allow-egress flag opt-in; tier-3 (default) keeps all data on-machine."
Fix 7 (HIGH H4) — Process gap: no verifier-agent pass
Severity: HIGH (process; produced the v1.7 BLOCKER) File: new doc + agents/ folder update
What's broken
During v1.7 development, code went straight from worker to commit without a separate verifier-agent pass. This audit IS the missing verifier — but doing it post-commit means findings become patches instead of pre-merge fixes. The BLOCKER (B1) would have been caught by any security-focused review before §3.3 was committed.
Changes
Create agents/verifier.md as a dispatched-on-demand specialist that workstream owners can call before commit. Mirrors the superpowers:verification-before-completion skill's checklist.
# verifier — Pre-Commit Audit Specialist
You are a verifier agent. Your job is to find issues a worker just missed, BEFORE
they commit. Apply the /best-practices six-cut kernel to the staged diff.
## When invoked
After a worker has staged changes for a workstream but BEFORE the commit.
Workflow:
1. Worker: `git add <files>; git status` shows staged changes
2. Dispatch this agent with the workstream context
3. Agent reads every staged file + the precedent files it touches
4. Agent applies six-cut + agent-kernel checks
5. Agent returns findings in 4 tiers (BLOCKER / HIGH / MEDIUM / LOW)
6. Worker addresses BLOCKER + HIGH before commit; MEDIUM/LOW become follow-ups
## Six-cut checklist (verify EACH cut)
[content matches the audit's §3 six-cut framework — copy from docs/audits/v1.7.0-audit-2026-05-17.md §3]
## Specifically check for in EVERY workstream
- **Data egress**: any new outbound network call, subprocess, or file write outside the vault root. If yes: is there a user opt-in checkpoint? Compare to existing `--allow-remote-ollama` / `--allow-egress` precedents.
- **Atomic operations**: any file write that could be interrupted mid-stream. If yes: is there a temp + rename, or other atomicity guarantee?
- **Failure-mode rollback**: any multi-step operation. If yes: is there a documented recovery path for partial completion?
- **Hermetic test coverage**: any new code path. If yes: is there a test that exercises it without network/LLM?
## Output
Report under 800 words. Findings in tiered list with file:line citations + recommended fix.
Verification
This is a process change, not a code change. Verify by:
- Confirm
agents/verifier.mdexists - Reference it from
CLAUDE.md"How to Use" section as the recommended pre-commit step - Document in next release notes (v1.7.1 changelog entry) so future workstreams know to use it
Commit message
docs(v1.7.1): add verifier-agent specialist for pre-commit audits
Close audit H4. The v1.7 development cycle had no verifier-agent pass at
workstream gates — code went straight from worker to commit. The audit
itself filled that role post-hoc, which is why the BLOCKER (B1) became a
v1.7.1 patch instead of a pre-merge fix.
New agents/verifier.md documents the on-demand specialist + the six-cut
checklist a workstream owner should dispatch BEFORE commit. CLAUDE.md
references it as the recommended pre-commit step.
No code changes; process change only.
Sequencing recommendation
Order the fixes by dependency + grouping:
- Fix 1 (BLOCKER) + Fix 6 (HIGH H6) — bundle as ONE commit since the Data Privacy callout is part of Fix 1's changes anyway
- Fix 2 (HIGH H1) — independent; small
- Fix 3 (HIGH H2) — independent; tiny
- Fix 4 (HIGH H3) — independent
- Fix 5 (HIGH H5) — independent
- Fix 7 (HIGH H4) — doc-only, last
Total: 6 commits (one per fix, Fix 1+6 combined). Each followed by make test. If anything red, stop and diagnose.
Post-fix steps
After all 6 commits land:
- Run
make testone final time. 7 suites green is the gate. - Run
bash bin/setup-retrieve.sh --no-llmend-to-end to verify the retrieval pipeline still provisions cleanly post-fixes. - Update
docs/audits/v1.7.0-audit-2026-05-17.md§10.2 to mark each H1-H6 as "FIXED in v1.7.1 commit<sha>" with the actual SHAs. - Update
wiki/hot.md"Last Updated" section to reflect v1.7.1. - Update
CHANGELOG.md— add a v1.7.1 entry that references the audit + lists the 7 fixes. - Update
.claude-plugin/plugin.json+.claude-plugin/marketplace.json— version bump 1.7.0 → 1.7.1. - Then ask the user whether to push + tag
v1.7.1. Do not push without explicit go.
What's NOT in this plan
- MEDIUM findings (M1-M14) — file as v1.7.x issues; address opportunistically
- LOW findings (L1-L10) — bundle into a "polish PR" before v1.8
- v1.8 (methodology modes), v1.9 (multimodal ingest), v2.0 (derive), v2.5+ (GUI shell) — separate work; see audit §9 verdict + §10 punch list
Files this plan modifies (summary)
| File | Reason | Fix # |
|---|---|---|
scripts/contextual-prefix.py |
--allow-egress flag, pick_prefix_tier guard | 1 |
bin/setup-retrieve.sh |
egress consent prompt + flag pass-through + rollback on Stage 1 failure | 1, 2 |
skills/wiki-retrieve/SKILL.md |
Data Privacy callout | 1 (H6 bundled) |
Makefile |
clean-test-state extension | 3 |
hooks/hooks.json |
PostToolUse explicit lock-check exit handling | 4 |
scripts/detect-transport.sh |
json_escape helper + apply to CLI_VERSION | 5 |
agents/verifier.md |
NEW — pre-commit specialist | 7 |
CLAUDE.md |
reference verifier-agent in recommended workflow | 7 |
docs/audits/v1.7.0-audit-2026-05-17.md |
mark fixes as FIXED with SHAs | post-fix step 3 |
wiki/hot.md |
v1.7.1 state update | post-fix step 4 |
CHANGELOG.md |
v1.7.1 entry | post-fix step 5 |
.claude-plugin/{plugin,marketplace}.json |
1.7.0 → 1.7.1 | post-fix step 6 |
Resumption hint (for post-compact me)
Quick state recovery on next session:
cd ~/Desktop/claude-obsidian
git log --oneline main..HEAD | head -10 # see all v1.7 commits + auto-commits
git status --short # confirm working tree
make test # 7 suites should be green
cat docs/audits/v1.7.1-fixes-plan.md # this file — your roadmap
cat docs/audits/v1.7.0-audit-2026-05-17.md | head -40 # exec verdict reminder
Then execute Fix 1 first. The plan is sequenced; just walk it top to bottom.