diff --git a/skills/wiki-ingest/SKILL.md b/skills/wiki-ingest/SKILL.md index b988f0b6..29650abb 100644 --- a/skills/wiki-ingest/SKILL.md +++ b/skills/wiki-ingest/SKILL.md @@ -170,7 +170,7 @@ Steps: 2. **Discuss** key takeaways with the user. Ask: "What should I emphasize? How granular?" Skip this if the user says "just ingest it." 3. **Create** source summary in `wiki/sources/`. Use the source frontmatter schema from `references/frontmatter.md`. Assign an address per the **Address Assignment** section below. 4. **Create or update** entity pages for every person, org, product, and repo mentioned. One page per entity. Assign addresses to new entity pages. -5. **Create or update** concept pages for significant ideas and frameworks. Assign addresses to new concept pages. +5. **Create or update** concept pages for significant ideas and frameworks. Assign addresses to new concept pages. Every concept page MUST include `source_refs` frontmatter as described in **Concept Source Provenance** below. 6. **Update** relevant domain page(s) and their `_index.md` sub-indexes. 7. **Update** `wiki/overview.md` if the big picture changed. 8. **Update** `wiki/index.md`. Add entries for all new pages. @@ -188,6 +188,33 @@ Steps: --- +## Concept Source Provenance + +Every `wiki/concepts/*.md` page must show which raw Markdown chunks support its `sources` list. Preserve `sources:` wikilinks for human-readable source pages, and add `source_refs:` immediately after `sources:`. The `source_refs` list must have one entry per `sources` entry, in the same order. + +```yaml +sources: + - "[[Midas-NFX-Analysis-Manual|Midas NFX Analysis Manual]]" +source_refs: + - source: "[[Midas-NFX-Analysis-Manual|Midas NFX Analysis Manual]]" + raw_path: ".raw/MidasNFXAnalysisManual/" + raw_files: + - "MidasNFXAnalysisManual_024.md" + md_indices: + - 24 + match: "direct-heading" + confidence: high +``` + +Rules: +- `raw_path` is the `.raw` source folder or file backing the source page. +- `raw_files` are exact `.md` filenames from that raw source; `md_indices` are the corresponding chunk numbers from suffixes like `_024.md`. For unsuffixed single-file sources, use `1`. +- Prefer exact chunks read during ingest. If exact evidence was not captured, backfill by heading/keyword search over `.raw/**/*.md`, set `match: "heuristic-heading-keyword"`, and mark `confidence: high|medium|low` honestly. +- Do not cite only table-of-contents, title, legal, or index chunks unless the concept actually comes from those chunks. +- When updating an existing concept, keep existing reliable `source_refs` and add or revise entries only for changed `sources`. + +--- + ## Batch Ingest Trigger: user drops multiple files or says "ingest all of these." @@ -244,6 +271,7 @@ Do not silently overwrite old claims. Flag and let the user decide. ## What Not to Do - **Source files under `.raw/` are immutable.** Do not modify the files that users drop there (articles, transcripts, images). The `.raw/.manifest.json` delta tracker and its `address_map` (DragonScale Mechanism 2) are the only files under `.raw/` that `wiki-ingest` itself maintains. Treat every other file under `.raw/` as read-only source content. +- Do not create or update a concept page without matching `source_refs` for every `sources` entry. - Do not create duplicate pages. Always check the index and search before creating. - Do not skip the log entry. Every ingest must be recorded. - Do not skip the hot cache update. It is what keeps future sessions fast.