add claude-obsidian
This commit is contained in:
@@ -0,0 +1,394 @@
|
||||
---
|
||||
name: wiki-lint
|
||||
description: >
|
||||
Health check the Obsidian wiki vault. Finds orphan pages, dead wikilinks, stale claims,
|
||||
missing cross-references, frontmatter gaps, and empty sections. Creates or updates
|
||||
Dataview dashboards. Generates canvas maps. Triggers on: "lint", "health check",
|
||||
"clean up wiki", "check the wiki", "wiki maintenance", "find orphans", "wiki audit".
|
||||
---
|
||||
|
||||
# wiki-lint: Wiki Health Check
|
||||
|
||||
Run lint after every 10-15 ingests, or weekly. Ask before auto-fixing anything. Output a lint report to `wiki/meta/lint-report-YYYY-MM-DD.md`.
|
||||
|
||||
---
|
||||
|
||||
## Transport (v1.7+)
|
||||
|
||||
Lint primarily reads, then writes a single report file. Both follow the standard transport policy. Read `.vault-meta/transport.json` (auto-created by `bash scripts/detect-transport.sh`):
|
||||
|
||||
- **cli** — `obsidian-cli read "$VAULT" "$NOTE"` for individual reads; `obsidian-cli backlinks "$VAULT" "$NOTE"` natively handles backlink graph (avoids re-rolling it via Grep); see [`skills/wiki-cli/SKILL.md`](../wiki-cli/SKILL.md)
|
||||
- **mcp-obsidian** / **mcpvault** — `mcp__obsidian-vault__read_multiple_notes`, `list_all_tags`
|
||||
- **filesystem** — Claude's `Read`/`Glob`/`Grep` (final floor; current v1.6 behavior)
|
||||
|
||||
Full decision tree: [`wiki/references/transport-fallback.md`](../../wiki/references/transport-fallback.md). DragonScale Mechanism 3 tiling lint is a separate code path (Python script) and bypasses transport selection.
|
||||
|
||||
---
|
||||
|
||||
## Lint Checks
|
||||
|
||||
Work through these in order:
|
||||
|
||||
1. **Orphan pages**. Wiki pages with no inbound wikilinks. They exist but nothing points to them.
|
||||
2. **Dead links**. Wikilinks that reference a page that does not exist.
|
||||
3. **Stale claims**. Assertions on older pages that newer sources have contradicted or updated.
|
||||
4. **Missing pages**. Concepts or entities mentioned in multiple pages but lacking their own page.
|
||||
5. **Missing cross-references**. Entities mentioned in a page but not linked.
|
||||
6. **Frontmatter gaps**. Pages missing required fields (type, status, created, updated, tags).
|
||||
7. **Empty sections**. Headings with no content underneath.
|
||||
8. **Stale index entries**. Items in `wiki/index.md` pointing to renamed or deleted pages.
|
||||
9. **Address validity** (DragonScale Mechanism 2). For every page that has an `address:` frontmatter field, validate the format. See the **Address Validation** section below.
|
||||
10. **Semantic tiling** (DragonScale Mechanism 3, opt-in). Flag candidate duplicate pages (across all scanned types, not just concepts) via embedding cosine similarity. See the **Semantic Tiling** section below.
|
||||
|
||||
---
|
||||
|
||||
## Lint Report Format
|
||||
|
||||
Create at `wiki/meta/lint-report-YYYY-MM-DD.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
type: meta
|
||||
title: "Lint Report YYYY-MM-DD"
|
||||
created: YYYY-MM-DD
|
||||
updated: YYYY-MM-DD
|
||||
tags: [meta, lint]
|
||||
status: developing
|
||||
---
|
||||
|
||||
# Lint Report: YYYY-MM-DD
|
||||
|
||||
## Summary
|
||||
- Pages scanned: N
|
||||
- Issues found: N
|
||||
- Auto-fixed: N
|
||||
- Needs review: N
|
||||
|
||||
## Orphan Pages
|
||||
- [[Page Name]]: no inbound links. Suggest: link from [[Related Page]] or delete.
|
||||
|
||||
## Dead Links
|
||||
- [[Missing Page]]: referenced in [[Source Page]] but does not exist. Suggest: create stub or remove link.
|
||||
|
||||
## Missing Pages
|
||||
- "concept name": mentioned in [[Page A]], [[Page B]], [[Page C]]. Suggest: create a concept page.
|
||||
|
||||
## Frontmatter Gaps
|
||||
- [[Page Name]]: missing fields: status, tags
|
||||
|
||||
## Stale Claims
|
||||
- [[Page Name]]: claim "X" may conflict with newer source [[Newer Source]].
|
||||
|
||||
## Cross-Reference Gaps
|
||||
- [[Entity Name]] mentioned in [[Page A]] without a wikilink.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
Enforce these during lint:
|
||||
|
||||
| Element | Convention | Example |
|
||||
|---------|-----------|---------|
|
||||
| Filenames | Title Case with spaces | `Machine Learning.md` |
|
||||
| Folders | lowercase with dashes | `wiki/data-models/` |
|
||||
| Tags | lowercase, hierarchical | `#domain/architecture` |
|
||||
| Wikilinks | match filename exactly | `[[Machine Learning]]` |
|
||||
|
||||
Filenames must be unique across the vault. Wikilinks work without paths only if filenames are unique.
|
||||
|
||||
---
|
||||
|
||||
## Writing Style Check
|
||||
|
||||
During lint, flag pages that violate the style guide:
|
||||
|
||||
- Not declarative present tense ("X basically does Y" instead of "X does Y")
|
||||
- Missing source citations where claims are made
|
||||
- Uncertainty not flagged with `> [!gap]`
|
||||
- Contradictions not flagged with `> [!contradiction]`
|
||||
|
||||
---
|
||||
|
||||
## Dataview Dashboard
|
||||
|
||||
Create or update `wiki/meta/dashboard.md` with these queries:
|
||||
|
||||
````markdown
|
||||
---
|
||||
type: meta
|
||||
title: "Dashboard"
|
||||
updated: YYYY-MM-DD
|
||||
---
|
||||
# Wiki Dashboard
|
||||
|
||||
## Recent Activity
|
||||
```dataview
|
||||
TABLE type, status, updated FROM "wiki" SORT updated DESC LIMIT 15
|
||||
```
|
||||
|
||||
## Seed Pages (Need Development)
|
||||
```dataview
|
||||
LIST FROM "wiki" WHERE status = "seed" SORT updated ASC
|
||||
```
|
||||
|
||||
## Entities Missing Sources
|
||||
```dataview
|
||||
LIST FROM "wiki/entities" WHERE !sources OR length(sources) = 0
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
```dataview
|
||||
LIST FROM "wiki/questions" WHERE answer_quality = "draft" SORT created DESC
|
||||
```
|
||||
````
|
||||
|
||||
---
|
||||
|
||||
## Canvas Map
|
||||
|
||||
Create or update `wiki/meta/overview.canvas` for a visual domain map:
|
||||
|
||||
```json
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"id": "1",
|
||||
"type": "file",
|
||||
"file": "wiki/overview.md",
|
||||
"x": 0, "y": 0,
|
||||
"width": 300, "height": 140,
|
||||
"color": "1"
|
||||
}
|
||||
],
|
||||
"edges": []
|
||||
}
|
||||
```
|
||||
|
||||
Add one node per domain page. Connect domains that have significant cross-references. Colors map to the CSS scheme: 1=blue, 2=purple, 3=yellow, 4=orange, 5=green, 6=red.
|
||||
|
||||
---
|
||||
|
||||
## Address Validation (DragonScale Mechanism 2 MVP)
|
||||
|
||||
**Opt-in feature.** Address Validation runs only if the vault is using DragonScale, detected by:
|
||||
|
||||
```bash
|
||||
if [ -x ./scripts/allocate-address.sh ] && [ -f ./.vault-meta/address-counter.txt ]; then
|
||||
DRAGONSCALE_ADDRESSES=1
|
||||
else
|
||||
DRAGONSCALE_ADDRESSES=0
|
||||
fi
|
||||
```
|
||||
|
||||
When `DRAGONSCALE_ADDRESSES=0`, skip this entire section. Missing `address:` fields are not flagged, not even informationally. Pages that happen to have an `address:` field are passed through unvalidated (treat as user-managed metadata).
|
||||
|
||||
When `DRAGONSCALE_ADDRESSES=1`, proceed with the rollout baseline and checks below.
|
||||
|
||||
Rollout baseline: **2026-04-23** (Phase 2 ship date in vaults that adopted DragonScale on that day). Vaults that adopted DragonScale later should override this baseline by setting the earliest `created:` date of any addressed page as their personal rollout date. Record the chosen baseline at the top of `.vault-meta/legacy-pages.txt` as a commented line: `# rollout: YYYY-MM-DD`.
|
||||
|
||||
### Classification rule (applied per page)
|
||||
|
||||
Before validating anything, classify the page:
|
||||
|
||||
| Classification | Criteria |
|
||||
|---|---|
|
||||
| **Meta / fold / excluded** | File is in `wiki/folds/` OR filename in `{_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, dashboard.base, Wiki Map.md, getting-started.md}`. Address not required. |
|
||||
| **Post-rollout (must have address)** | `type` is not meta/fold AND frontmatter `created:` date is >= 2026-04-23 AND file path is NOT in the legacy baseline manifest. |
|
||||
| **Legacy (backfill-eligible)** | `type` is not meta/fold AND frontmatter `created:` date is < 2026-04-23 OR file path IS in the legacy baseline manifest. Address not required until backfill. |
|
||||
|
||||
**Legacy baseline manifest**: optional file at `.vault-meta/legacy-pages.txt`, one relative path per line. Pages listed there are treated as legacy regardless of `created:` date. Use this to grandfather pages whose `created:` metadata is wrong or missing.
|
||||
|
||||
### Validation checks (run in order)
|
||||
|
||||
1. **Format check**: any page with `address:` set must match one of:
|
||||
- `^c-[0-9]{6}$` — post-rollout creation address.
|
||||
- `^l-[0-9]{6}$` — legacy-backfill address.
|
||||
- Pages under `wiki/folds/` use `fold_id`, not `address`; do not apply the `c-`/`l-` regex there.
|
||||
|
||||
2. **Uniqueness check**: no two pages share the same address value. Report both paths.
|
||||
|
||||
3. **Counter consistency**: `./scripts/allocate-address.sh --peek` returns the next counter value. Every observed `c-NNNNNN` must satisfy `NNNNNN < peek_value`. Violation = counter drift.
|
||||
|
||||
4. **Post-rollout enforcement**: every page classified as "post-rollout (must have address)" that LACKS the `address:` field is a lint **error**, not informational. This prevents the silent-regression path where a new page skips address assignment.
|
||||
|
||||
5. **Legacy identification**: every page classified as "legacy" that LACKS an address is informational. The lint report lists them under "Pending backfill" with total count.
|
||||
|
||||
6. **Address-map consistency** (`.raw/.manifest.json`): for every page path in `address_map`, the page must exist and its frontmatter `address` must match the mapping. Mismatches are errors (either a rename dropped the map update, or a manual edit diverged).
|
||||
|
||||
### Lint posture summary
|
||||
|
||||
- Pages that HAVE an address with bad format: **error**.
|
||||
- Pages that HAVE colliding addresses: **error**.
|
||||
- Pages classified **post-rollout** WITHOUT an address: **error**.
|
||||
- Pages classified **legacy** WITHOUT an address: **informational** (expected).
|
||||
- Meta and fold pages without `address`: **ignored** (not applicable).
|
||||
- Counter drift (observed counter >= peek): **error**.
|
||||
- Address-map mismatch: **error**.
|
||||
|
||||
Lint only observes. Do NOT auto-assign missing addresses during lint. Assignment is `wiki-ingest`'s responsibility only.
|
||||
|
||||
### Output section in the lint report
|
||||
|
||||
```markdown
|
||||
## Address Validation
|
||||
|
||||
- Counter state: `$(./scripts/allocate-address.sh --peek)`
|
||||
- Highest c- address observed: c-XXXXXX
|
||||
- Post-rollout pages checked: N (X passing, Y errors)
|
||||
- Legacy pages pending backfill: M
|
||||
|
||||
### Errors
|
||||
- [[Page Name]]: invalid address format `{value}`. Expected `c-NNNNNN` or `l-NNNNNN`.
|
||||
- [[Page A]] and [[Page B]] share address `c-000042`.
|
||||
- [[Post-Rollout Page]]: missing address. Page created 2026-04-25 (post-rollout); address required. Run wiki-ingest or manually run `./scripts/allocate-address.sh` and add to frontmatter.
|
||||
- [[Page Name]] has address `c-000100` but counter peek is `50`. Counter drift; run `./scripts/allocate-address.sh --rebuild`.
|
||||
- `.raw/.manifest.json` maps `wiki/foo.md` -> `c-000010` but page frontmatter has `c-000012`. Resolve mismatch.
|
||||
|
||||
### Pending backfill (informational)
|
||||
- M legacy pages without addresses. See `.vault-meta/legacy-pages.txt` for the canonical legacy set, or filter by `created:` < 2026-04-23.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Semantic Tiling (DragonScale Mechanism 3 MVP, opt-in)
|
||||
|
||||
**Opt-in feature.** Semantic tiling flags candidate duplicate *pages* (not just concept pages — see Scope below) using embedding cosine similarity. Local ollama only by default; remote endpoints require an explicit override flag.
|
||||
|
||||
### Detection and delegation
|
||||
|
||||
```bash
|
||||
if [ -x ./scripts/tiling-check.py ] && command -v python3 >/dev/null 2>&1; then
|
||||
./scripts/tiling-check.py --peek > /tmp/tiling-peek.json 2>/dev/null
|
||||
PEEK_EXIT=$?
|
||||
case $PEEK_EXIT in
|
||||
0) TILING_READY=1 ;; # ready
|
||||
2) TILING_READY=0 ; echo "tiling ERROR: usage error (exit 2); inspect /tmp/tiling-peek.json" ;;
|
||||
3) TILING_READY=0 ; echo "tiling ERROR: cache corrupt (exit 3); inspect .vault-meta/tiling-cache.json" ;;
|
||||
4) TILING_READY=0 ; echo "tiling ERROR: vault exceeds scale hard-fail (exit 4); batching required" ;;
|
||||
10) TILING_READY=0 ; echo "tiling skipped: ollama not reachable (exit 10)" ;;
|
||||
11) TILING_READY=0 ; echo "tiling skipped: run 'ollama pull nomic-embed-text' to enable (exit 11)" ;;
|
||||
*) TILING_READY=0 ; echo "tiling ERROR: unexpected exit code $PEEK_EXIT from tiling-check.py --peek" ;;
|
||||
esac
|
||||
else
|
||||
TILING_READY=0
|
||||
echo "tiling skipped: scripts/tiling-check.py or python3 not available"
|
||||
fi
|
||||
```
|
||||
|
||||
Inspect `/tmp/tiling-peek.json` (structured diagnostics: script path, python interpreter, ollama URL, cache state, thresholds state) whenever the status is ambiguous. Never collapse unknown exits into "unknown status" silently.
|
||||
|
||||
When `TILING_READY=1`:
|
||||
|
||||
```bash
|
||||
./scripts/tiling-check.py --report wiki/meta/tiling-report-YYYY-MM-DD.md
|
||||
REPORT_EXIT=$?
|
||||
case $REPORT_EXIT in
|
||||
0) echo "tiling report written" ;;
|
||||
2) echo "tiling ERROR: usage error during --report" ;;
|
||||
3) echo "tiling ERROR: cache corrupt during --report" ;;
|
||||
4) echo "tiling ERROR: scale hard-fail during --report" ;;
|
||||
10) echo "tiling ERROR: ollama became unreachable between --peek and --report" ;;
|
||||
11) echo "tiling ERROR: model became unavailable between --peek and --report" ;;
|
||||
*) echo "tiling ERROR: unexpected exit code $REPORT_EXIT from tiling-check.py --report" ;;
|
||||
esac
|
||||
```
|
||||
|
||||
### Scope (what the helper scans)
|
||||
|
||||
- Includes: every `.md` under `wiki/` **except** the exclusion set below. The scope is "candidate tileable pages," not just `type: concept`.
|
||||
- Excludes (path): anything under `wiki/folds/` or `wiki/meta/`.
|
||||
- Excludes (filename): `_index.md`, `index.md`, `log.md`, `hot.md`, `overview.md`, `dashboard.md`, `Wiki Map.md`, `getting-started.md`.
|
||||
- Excludes (frontmatter): `type: meta` or `type: fold`.
|
||||
- Excludes (security): symlinks. Any page file that is a symlink, or whose resolved path escapes the vault root, is skipped.
|
||||
|
||||
If you place a real concept under `wiki/meta/` it will be excluded by path regardless of content. Keep concepts in their canonical folders.
|
||||
|
||||
### How the helper works
|
||||
|
||||
- Computes one embedding per included page via the ollama `nomic-embed-text` model by default.
|
||||
- Caches embeddings at `.vault-meta/tiling-cache.json`, keyed on `sha256(model + body)` so model drift auto-invalidates. Frontmatter is not part of the hash or the embedding input — pure frontmatter edits (tag changes, status bumps) do not trigger recomputation.
|
||||
- Orphans are GC'd: when a cached page path no longer exists on disk, its entry is dropped on save.
|
||||
- Concurrent-safe: exclusive flock on `.vault-meta/.tiling.lock` around cache I/O; per-PID temp file for atomic writes.
|
||||
|
||||
### Security posture
|
||||
|
||||
- Defaults to `http://127.0.0.1:11434`. `OLLAMA_URL` env override is accepted **only** with `--allow-remote-ollama` because page bodies are POSTed as embedding input.
|
||||
- Symlinks and vault-root escapes are rejected.
|
||||
|
||||
### Default bands (conservative seeds, NOT calibrated)
|
||||
|
||||
| Band | Similarity | Report section |
|
||||
|---|---|---|
|
||||
| Error | `>= 0.90` | **Errors** — strong near-duplicate, likely the same concept |
|
||||
| Review | `0.80 - 0.90` | **Review** — possible tile overlap; human judgement needed |
|
||||
| Pass | `< 0.80` | not emitted |
|
||||
|
||||
**These values are conservative seeds, not literature-backed interpolation.** Published reference points: Sentence Transformers `community_detection` defaults to 0.75; Quora-duplicate calibrations land around 0.7715-0.8352 depending on objective. The 0.80 review floor is already stricter than at least one cited Quora optimum, so expect **false negatives** against those baselines. Reduce the review floor during calibration if you want more sensitivity.
|
||||
|
||||
### Calibration procedure (manual, one-time per vault)
|
||||
|
||||
1. Run the helper with defaults. Capture the **Review** band pairs.
|
||||
2. Temporarily lower `bands.review` to `0.70` in `.vault-meta/tiling-thresholds.json` to surface a wider sample. Aim for >=50 pairs spanning 0.70-0.95.
|
||||
3. Label each pair: `duplicate`, `similar`, `distinct`.
|
||||
4. Pick bands such that: (a) the `error` band contains >= 95% true duplicates; (b) the `review` band captures `similar` pairs without swamping the report with `distinct` ones.
|
||||
5. Edit `.vault-meta/tiling-thresholds.json`: set new `bands.error` and `bands.review`, set `calibrated: true`, set `calibration_pairs_labeled` to the label count.
|
||||
6. Re-run lint. Report footer now says `calibrated: true`.
|
||||
|
||||
### Scale
|
||||
|
||||
- Cold-cache cost is O(N) POSTs to ollama. Warm-cache cost is O(N^2) cosines in pure Python.
|
||||
- Helper prints a warning at > 500 pages and hard-fails (exit 4) at > 5000. Revisit the implementation (batching, vectorized cosine, or external tooling) before exceeding either limit.
|
||||
|
||||
### Lint report embed
|
||||
|
||||
```markdown
|
||||
## Semantic Tiling
|
||||
See [[tiling-report-YYYY-MM-DD]] for the full pair listing.
|
||||
- Errors (>=0.90): N pairs
|
||||
- Review (0.80-0.90): M pairs
|
||||
- Calibrated: true|false
|
||||
```
|
||||
|
||||
### Invariants
|
||||
|
||||
- Read-only. `tiling-check.py` never modifies wiki pages.
|
||||
- No auto-merge. Duplicates are listed, never resolved.
|
||||
- Cache is incremental and model-scoped. Unchanged pages are not re-embedded.
|
||||
- Exit codes: `0` ok, `2` usage error, `3` cache corrupt, `4` scale hard-fail, `10` ollama unreachable, `11` model missing. Surface all of them; do not collapse into a single "unknown" bucket.
|
||||
|
||||
---
|
||||
|
||||
## Before Auto-Fixing
|
||||
|
||||
Always show the lint report first. Ask: "Should I fix these automatically, or do you want to review each one?"
|
||||
|
||||
Safe to auto-fix:
|
||||
- Adding missing frontmatter fields with placeholder values
|
||||
- Creating stub pages for missing entities
|
||||
- Adding wikilinks for unlinked mentions
|
||||
|
||||
Needs review before fixing:
|
||||
- Deleting orphan pages (they might be intentionally isolated)
|
||||
- Resolving contradictions (requires human judgment)
|
||||
- Merging duplicate pages
|
||||
|
||||
---
|
||||
|
||||
## How to think (10-principle mapping)
|
||||
|
||||
When working on this skill, apply the 10-principle loop. See [`skills/think/SKILL.md`](../think/SKILL.md) for the canonical framework.
|
||||
|
||||
| # | Principle | Application here |
|
||||
|---|-----------|-------------------|
|
||||
| 1 | OBSERVE (ext) | Scan every page, every wikilink, every frontmatter block. No skipping for size or apparent obviousness. |
|
||||
| 2 | OBSERVE (int) | Am I biased toward "looks fine"? Pretend you're a hostile reader looking for what's actually wrong. |
|
||||
| 3 | LISTEN | Did the user mention specific concerns this session? Prioritize those over generic checks. |
|
||||
| 4 | THINK | Which checks matter? Tier findings by severity (BLOCKER / HIGH / MEDIUM / LOW), not by ease of fix. |
|
||||
| 5 | CONNECT (lat) | Orphan + dead-link + frontmatter-gap patterns often co-occur. Cluster findings to expose root causes. |
|
||||
| 6 | CONNECT (sys) | Tiling-check + Dataview dashboards + canvas overview — multiple lint surfaces to integrate. |
|
||||
| 7 | FEEL | A lint report should empower, not shame. Actionable items beat exhaustive catalog. |
|
||||
| 8 | ACCEPT | Some lint findings are deliberate (orphan-by-design, intentional stub). Flag, don't force. |
|
||||
| 9 | CREATE | Lint report at `wiki/meta/lint-report-YYYY-MM-DD.md` with tiered findings. |
|
||||
| 10 | GROW | Recurring lint findings → process improvement targets, not just one-time fixes. |
|
||||
Reference in New Issue
Block a user