Files
MultiPhysicsVault/skills/wiki-lint/SKILL.md
T
김경종 b13258af9f
Tests / Hermetic test suite (push) Has been cancelled
Tests / Skill frontmatter validation (push) Has been cancelled
add documents and wiki
2026-06-02 16:33:07 +09:00

411 lines
19 KiB
Markdown

---
name: wiki-lint
description: >
Health check the Obsidian wiki vault. Finds orphan pages, dead wikilinks, stale claims,
missing cross-references, frontmatter gaps, and empty sections. Creates or updates
Dataview dashboards. Generates canvas maps. Triggers on: "lint", "health check",
"clean up wiki", "check the wiki", "wiki maintenance", "find orphans", "wiki audit".
---
# wiki-lint: Wiki Health Check
Run lint after every 10-15 ingests, or weekly. Ask before auto-fixing anything. Output a lint report to `wiki/meta/lint-report-YYYY-MM-DD.md`.
---
## Transport (v1.7+)
Lint primarily reads, then writes a single report file. Both follow the standard transport policy. Read `.vault-meta/transport.json` (auto-created by `bash scripts/detect-transport.sh`):
- **cli** — `obsidian-cli read "$VAULT" "$NOTE"` for individual reads; `obsidian-cli backlinks "$VAULT" "$NOTE"` natively handles backlink graph (avoids re-rolling it via Grep); see [`skills/wiki-cli/SKILL.md`](../wiki-cli/SKILL.md)
- **mcp-obsidian** / **mcpvault**`mcp__obsidian-vault__read_multiple_notes`, `list_all_tags`
- **filesystem** — Claude's `Read`/`Glob`/`Grep` (final floor; current v1.6 behavior)
Full decision tree: [`wiki/references/transport-fallback.md`](../../wiki/references/transport-fallback.md). DragonScale Mechanism 3 tiling lint is a separate code path (Python script) and bypasses transport selection.
---
## Lint Checks
Work through these in order:
1. **Orphan pages**. Wiki pages with no inbound wikilinks. They exist but nothing points to them.
2. **Dead links**. Wikilinks that reference a page that does not exist.
3. **Stale claims**. Assertions on older pages that newer sources have contradicted or updated.
4. **Missing pages**. Concepts or entities mentioned in multiple pages but lacking their own page.
5. **Missing cross-references**. Entities mentioned in a page but not linked.
6. **Frontmatter gaps**. Pages missing required fields (type, status, created, updated, tags).
7. **Empty sections**. Headings with no content underneath.
8. **Stale index entries**. Items in `wiki/index.md` pointing to renamed or deleted pages.
9. **Address validity** (DragonScale Mechanism 2). For every page that has an `address:` frontmatter field, validate the format. See the **Address Validation** section below.
10. **Semantic tiling** (DragonScale Mechanism 3, opt-in). Flag candidate duplicate pages (across all scanned types, not just concepts) via embedding cosine similarity. See the **Semantic Tiling** section below.
### Support-reference exclusions
Files under `wiki/references/` are support documentation for skills and vault mechanics, not first-class wiki notes. Exclude `wiki/references/` from orphan-page checks, missing cross-reference checks, frontmatter-gap checks, empty-section checks, address enforcement, and semantic tiling. Still resolve any `[[wikilink]]` that appears inside those files so broken links are not hidden.
---
## Lint Report Format
Create at `wiki/meta/lint-report-YYYY-MM-DD.md`:
```markdown
---
type: meta
title: "Lint Report YYYY-MM-DD"
created: YYYY-MM-DD
updated: YYYY-MM-DD
tags: [meta, lint]
status: developing
---
# Lint Report: YYYY-MM-DD
## Summary
- Pages scanned: N
- Issues found: N
- Auto-fixed: N
- Needs review: N
## Orphan Pages
- [[Page Name]]: no inbound links. Suggest: link from [[Related Page]] or delete.
## Dead Links
- [[Missing Page]]: referenced in [[Source Page]] but does not exist. Suggest: create stub or remove link.
## Missing Pages
- "concept name": mentioned in [[Page A]], [[Page B]], [[Page C]]. Suggest: create a concept page.
## Frontmatter Gaps
- [[Page Name]]: missing fields: status, tags
## Stale Claims
- [[Page Name]]: claim "X" may conflict with newer source [[Newer Source]].
## Cross-Reference Gaps
- [[Entity Name]] mentioned in [[Page A]] without a wikilink.
```
---
## Naming Conventions
Enforce these during lint:
| Element | Convention | Example |
|---------|-----------|---------|
| Filenames | Title Case with spaces | `Machine Learning.md` |
| Folders | lowercase with dashes | `wiki/data-models/` |
| Tags | lowercase, hierarchical | `#domain/architecture` |
| Wikilinks | match filename exactly | `[[Machine Learning]]` |
Filenames must be unique across the vault. Wikilinks work without paths only if filenames are unique.
---
## Cross-Reference Heuristic
Missing cross-reference detection is heuristic. To avoid noisy reports in product/manual-heavy vaults:
- Scan prose body only. Ignore YAML frontmatter, aliases, title fields, headings, code blocks, inline code, and existing wikilinks.
- Treat an entity as already linked if it appears in `related:` or `sources:` frontmatter as a wikilink.
- Do not flag a page when the entity name is used as the page namespace or title prefix, such as `Abaqus ...`, `Midas FEA ...`, or `Midas Civil ...` concept pages.
- Do not flag pages under `wiki/references/`, `wiki/meta/`, or `wiki/folds/`.
- Report candidates as "needs review", not errors, unless a human has confirmed the missing link policy for that entity type.
---
## Writing Style Check
During lint, flag pages that violate the style guide:
- Not declarative present tense ("X basically does Y" instead of "X does Y")
- Missing source citations where claims are made
- Uncertainty not flagged with `> [!gap]`
- Contradictions not flagged with `> [!contradiction]`
---
## Dataview Dashboard
Create or update `wiki/meta/dashboard.md` with these queries:
````markdown
---
type: meta
title: "Dashboard"
updated: YYYY-MM-DD
---
# Wiki Dashboard
## Recent Activity
```dataview
TABLE type, status, updated FROM "wiki" SORT updated DESC LIMIT 15
```
## Seed Pages (Need Development)
```dataview
LIST FROM "wiki" WHERE status = "seed" SORT updated ASC
```
## Entities Missing Sources
```dataview
LIST FROM "wiki/entities" WHERE !sources OR length(sources) = 0
```
## Open Questions
```dataview
LIST FROM "wiki/questions" WHERE answer_quality = "draft" SORT created DESC
```
````
---
## Canvas Map
Create or update `wiki/meta/overview.canvas` for a visual domain map:
```json
{
"nodes": [
{
"id": "1",
"type": "file",
"file": "wiki/overview.md",
"x": 0, "y": 0,
"width": 300, "height": 140,
"color": "1"
}
],
"edges": []
}
```
Add one node per domain page. Connect domains that have significant cross-references. Colors map to the CSS scheme: 1=blue, 2=purple, 3=yellow, 4=orange, 5=green, 6=red.
---
## Address Validation (DragonScale Mechanism 2 MVP)
**Opt-in feature.** Address Validation runs only if the vault is using DragonScale, detected by:
```bash
if [ -x ./scripts/allocate-address.sh ] && [ -f ./.vault-meta/address-counter.txt ]; then
DRAGONSCALE_ADDRESSES=1
else
DRAGONSCALE_ADDRESSES=0
fi
```
When `DRAGONSCALE_ADDRESSES=0`, skip this entire section. Missing `address:` fields are not flagged, not even informationally. Pages that happen to have an `address:` field are passed through unvalidated (treat as user-managed metadata).
When `DRAGONSCALE_ADDRESSES=1`, proceed with the rollout baseline and checks below.
Rollout baseline: **2026-04-23** (Phase 2 ship date in vaults that adopted DragonScale on that day). Vaults that adopted DragonScale later should override this baseline by setting the earliest `created:` date of any addressed page as their personal rollout date. Record the chosen baseline at the top of `.vault-meta/legacy-pages.txt` as a commented line: `# rollout: YYYY-MM-DD`.
### Classification rule (applied per page)
Before validating anything, classify the page:
| Classification | Criteria |
|---|---|
| **Meta / fold / support / excluded** | File is in `wiki/folds/` OR `wiki/references/` OR filename in `{_index.md, index.md, log.md, hot.md, overview.md, dashboard.md, dashboard.base, Wiki Map.md, getting-started.md}`. Address not required. |
| **Post-rollout (must have address)** | `type` is not meta/fold AND frontmatter `created:` date is >= 2026-04-23 AND file path is NOT in the legacy baseline manifest. |
| **Legacy (backfill-eligible)** | `type` is not meta/fold AND frontmatter `created:` date is < 2026-04-23 OR file path IS in the legacy baseline manifest. Address not required until backfill. |
**Legacy baseline manifest**: optional file at `.vault-meta/legacy-pages.txt`, one relative path per line. Pages listed there are treated as legacy regardless of `created:` date. Use this to grandfather pages whose `created:` metadata is wrong or missing.
### Validation checks (run in order)
1. **Format check**: any page with `address:` set must match one of:
- `^c-[0-9]{6}$` — post-rollout creation address.
- `^l-[0-9]{6}$` — legacy-backfill address.
- Pages under `wiki/folds/` use `fold_id`, not `address`; do not apply the `c-`/`l-` regex there.
2. **Uniqueness check**: no two pages share the same address value. Report both paths.
3. **Counter consistency**: `./scripts/allocate-address.sh --peek` returns the next counter value. Every observed `c-NNNNNN` must satisfy `NNNNNN < peek_value`. Violation = counter drift.
4. **Post-rollout enforcement**: every page classified as "post-rollout (must have address)" that LACKS the `address:` field is a lint **error**, not informational. This prevents the silent-regression path where a new page skips address assignment.
5. **Legacy identification**: every page classified as "legacy" that LACKS an address is informational. The lint report lists them under "Pending backfill" with total count.
6. **Address-map consistency** (`.raw/.manifest.json`): for every page path in `address_map`, the page must exist and its frontmatter `address` must match the mapping. Mismatches are errors (either a rename dropped the map update, or a manual edit diverged).
### Lint posture summary
- Pages that HAVE an address with bad format: **error**.
- Pages that HAVE colliding addresses: **error**.
- Pages classified **post-rollout** WITHOUT an address: **error**.
- Pages classified **legacy** WITHOUT an address: **informational** (expected).
- Meta, fold, and support-reference pages without `address`: **ignored** (not applicable).
- Counter drift (observed counter >= peek): **error**.
- Address-map mismatch: **error**.
Lint only observes. Do NOT auto-assign missing addresses during lint. Assignment is `wiki-ingest`'s responsibility only.
### Output section in the lint report
```markdown
## Address Validation
- Counter state: `$(./scripts/allocate-address.sh --peek)`
- Highest c- address observed: c-XXXXXX
- Post-rollout pages checked: N (X passing, Y errors)
- Legacy pages pending backfill: M
### Errors
- [[Page Name]]: invalid address format `{value}`. Expected `c-NNNNNN` or `l-NNNNNN`.
- [[Page A]] and [[Page B]] share address `c-000042`.
- [[Post-Rollout Page]]: missing address. Page created 2026-04-25 (post-rollout); address required. Run wiki-ingest or manually run `./scripts/allocate-address.sh` and add to frontmatter.
- [[Page Name]] has address `c-000100` but counter peek is `50`. Counter drift; run `./scripts/allocate-address.sh --rebuild`.
- `.raw/.manifest.json` maps `wiki/foo.md` -> `c-000010` but page frontmatter has `c-000012`. Resolve mismatch.
### Pending backfill (informational)
- M legacy pages without addresses. See `.vault-meta/legacy-pages.txt` for the canonical legacy set, or filter by `created:` < 2026-04-23.
```
---
## Semantic Tiling (DragonScale Mechanism 3 MVP, opt-in)
**Opt-in feature.** Semantic tiling flags candidate duplicate *pages* (not just concept pages — see Scope below) using embedding cosine similarity. Local ollama only by default; remote endpoints require an explicit override flag.
### Detection and delegation
```bash
if [ -x ./scripts/tiling-check.py ] && command -v python3 >/dev/null 2>&1; then
./scripts/tiling-check.py --peek > /tmp/tiling-peek.json 2>/dev/null
PEEK_EXIT=$?
case $PEEK_EXIT in
0) TILING_READY=1 ;; # ready
2) TILING_READY=0 ; echo "tiling ERROR: usage error (exit 2); inspect /tmp/tiling-peek.json" ;;
3) TILING_READY=0 ; echo "tiling ERROR: cache corrupt (exit 3); inspect .vault-meta/tiling-cache.json" ;;
4) TILING_READY=0 ; echo "tiling ERROR: vault exceeds scale hard-fail (exit 4); batching required" ;;
10) TILING_READY=0 ; echo "tiling skipped: ollama not reachable (exit 10)" ;;
11) TILING_READY=0 ; echo "tiling skipped: run 'ollama pull nomic-embed-text' to enable (exit 11)" ;;
*) TILING_READY=0 ; echo "tiling ERROR: unexpected exit code $PEEK_EXIT from tiling-check.py --peek" ;;
esac
else
TILING_READY=0
echo "tiling skipped: scripts/tiling-check.py or python3 not available"
fi
```
Inspect `/tmp/tiling-peek.json` (structured diagnostics: script path, python interpreter, ollama URL, cache state, thresholds state) whenever the status is ambiguous. Never collapse unknown exits into "unknown status" silently.
When `TILING_READY=1`:
```bash
./scripts/tiling-check.py --report wiki/meta/tiling-report-YYYY-MM-DD.md
REPORT_EXIT=$?
case $REPORT_EXIT in
0) echo "tiling report written" ;;
2) echo "tiling ERROR: usage error during --report" ;;
3) echo "tiling ERROR: cache corrupt during --report" ;;
4) echo "tiling ERROR: scale hard-fail during --report" ;;
10) echo "tiling ERROR: ollama became unreachable between --peek and --report" ;;
11) echo "tiling ERROR: model became unavailable between --peek and --report" ;;
*) echo "tiling ERROR: unexpected exit code $REPORT_EXIT from tiling-check.py --report" ;;
esac
```
### Scope (what the helper scans)
- Includes: every `.md` under `wiki/` **except** the exclusion set below. The scope is "candidate tileable pages," not just `type: concept`.
- Excludes (path): anything under `wiki/folds/`, `wiki/meta/`, or `wiki/references/`.
- Excludes (filename): `_index.md`, `index.md`, `log.md`, `hot.md`, `overview.md`, `dashboard.md`, `Wiki Map.md`, `getting-started.md`.
- Excludes (frontmatter): `type: meta` or `type: fold`.
- Excludes (security): symlinks. Any page file that is a symlink, or whose resolved path escapes the vault root, is skipped.
If you place a real concept under `wiki/meta/` it will be excluded by path regardless of content. Keep concepts in their canonical folders.
### How the helper works
- Computes one embedding per included page via the ollama `nomic-embed-text` model by default.
- Caches embeddings at `.vault-meta/tiling-cache.json`, keyed on `sha256(model + body)` so model drift auto-invalidates. Frontmatter is not part of the hash or the embedding input — pure frontmatter edits (tag changes, status bumps) do not trigger recomputation.
- Orphans are GC'd: when a cached page path no longer exists on disk, its entry is dropped on save.
- Concurrent-safe: exclusive flock on `.vault-meta/.tiling.lock` around cache I/O; per-PID temp file for atomic writes.
### Security posture
- Defaults to `http://127.0.0.1:11434`. `OLLAMA_URL` env override is accepted **only** with `--allow-remote-ollama` because page bodies are POSTed as embedding input.
- Symlinks and vault-root escapes are rejected.
### Default bands (conservative seeds, NOT calibrated)
| Band | Similarity | Report section |
|---|---|---|
| Error | `>= 0.90` | **Errors** — strong near-duplicate, likely the same concept |
| Review | `0.80 - 0.90` | **Review** — possible tile overlap; human judgement needed |
| Pass | `< 0.80` | not emitted |
**These values are conservative seeds, not literature-backed interpolation.** Published reference points: Sentence Transformers `community_detection` defaults to 0.75; Quora-duplicate calibrations land around 0.7715-0.8352 depending on objective. The 0.80 review floor is already stricter than at least one cited Quora optimum, so expect **false negatives** against those baselines. Reduce the review floor during calibration if you want more sensitivity.
### Calibration procedure (manual, one-time per vault)
1. Run the helper with defaults. Capture the **Review** band pairs.
2. Temporarily lower `bands.review` to `0.70` in `.vault-meta/tiling-thresholds.json` to surface a wider sample. Aim for >=50 pairs spanning 0.70-0.95.
3. Label each pair: `duplicate`, `similar`, `distinct`.
4. Pick bands such that: (a) the `error` band contains >= 95% true duplicates; (b) the `review` band captures `similar` pairs without swamping the report with `distinct` ones.
5. Edit `.vault-meta/tiling-thresholds.json`: set new `bands.error` and `bands.review`, set `calibrated: true`, set `calibration_pairs_labeled` to the label count.
6. Re-run lint. Report footer now says `calibrated: true`.
### Scale
- Cold-cache cost is O(N) POSTs to ollama. Warm-cache cost is O(N^2) cosines in pure Python.
- Helper prints a warning at > 500 pages and hard-fails (exit 4) at > 5000. Revisit the implementation (batching, vectorized cosine, or external tooling) before exceeding either limit.
### Lint report embed
```markdown
## Semantic Tiling
See [[tiling-report-YYYY-MM-DD]] for the full pair listing.
- Errors (>=0.90): N pairs
- Review (0.80-0.90): M pairs
- Calibrated: true|false
```
### Invariants
- Read-only. `tiling-check.py` never modifies wiki pages.
- No auto-merge. Duplicates are listed, never resolved.
- Cache is incremental and model-scoped. Unchanged pages are not re-embedded.
- Exit codes: `0` ok, `2` usage error, `3` cache corrupt, `4` scale hard-fail, `10` ollama unreachable, `11` model missing. Surface all of them; do not collapse into a single "unknown" bucket.
---
## Before Auto-Fixing
Always show the lint report first. Ask: "Should I fix these automatically, or do you want to review each one?"
Safe to auto-fix:
- Adding missing frontmatter fields with placeholder values
- Creating stub pages for missing entities
- Adding wikilinks for unlinked mentions
Needs review before fixing:
- Deleting orphan pages (they might be intentionally isolated)
- Resolving contradictions (requires human judgment)
- Merging duplicate pages
---
## How to think (10-principle mapping)
When working on this skill, apply the 10-principle loop. See [`skills/think/SKILL.md`](../think/SKILL.md) for the canonical framework.
| # | Principle | Application here |
|---|-----------|-------------------|
| 1 | OBSERVE (ext) | Scan every page, every wikilink, every frontmatter block. No skipping for size or apparent obviousness. |
| 2 | OBSERVE (int) | Am I biased toward "looks fine"? Pretend you're a hostile reader looking for what's actually wrong. |
| 3 | LISTEN | Did the user mention specific concerns this session? Prioritize those over generic checks. |
| 4 | THINK | Which checks matter? Tier findings by severity (BLOCKER / HIGH / MEDIUM / LOW), not by ease of fix. |
| 5 | CONNECT (lat) | Orphan + dead-link + frontmatter-gap patterns often co-occur. Cluster findings to expose root causes. |
| 6 | CONNECT (sys) | Tiling-check + Dataview dashboards + canvas overview — multiple lint surfaces to integrate. |
| 7 | FEEL | A lint report should empower, not shame. Actionable items beat exhaustive catalog. |
| 8 | ACCEPT | Some lint findings are deliberate (orphan-by-design, intentional stub). Flag, don't force. |
| 9 | CREATE | Lint report at `wiki/meta/lint-report-YYYY-MM-DD.md` with tiered findings. |
| 10 | GROW | Recurring lint findings → process improvement targets, not just one-time fixes. |