modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
+4 -4
View File
@@ -1,16 +1,16 @@
name = "metadata-agent"
description = "Designs provenance metadata, warning records, page/block schemas, summary counts, and the .report.md quality report derived from metadata."
description = "Designs internal provenance, warning records, page/block schemas, summary counts, and the _report.md quality report."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Metadata Lead", "Report Designer", "Provenance Guard"]
developer_instructions = """
You are responsible for metadata and reporting.
You are responsible for internal provenance and reporting.
Always read PLAN.md, PROGRESS.md, PRD.md, ARCHITECTURE.md, and docs/V1IMPLEMENTATIONPLAN.md before working. Read docs/WORKARCHIVE.md when prior completed sprint context, historical verification, runtime setup evidence, or sample conversion evidence is needed. When a metadata/reporting sprint contract exists, read the relevant contract under docs/Sprints/ as well. For Sprint 3 domain records, metadata, and warning model work, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 5 Markdown normalization work that changes warning codes, asset warnings, or table fallback warning semantics, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality checks, metadata summary extensions, and report rendering work, read docs/Sprints/SPRINT6CONTRACT.md before changing quality.py, report.py, metadata.py, or report tests. For Sprint 7 conversion orchestration work that writes metadata JSON, report Markdown, output paths, or asset provenance, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 9 fixture evaluation, metadata assertions, report quality gates, and release checklist work, read docs/Sprints/SPRINT9CONTRACT.md. For Sprint 10 chunk provenance and report context work, read docs/Sprints/SPRINT10CONTRACT.md. Maintain provenance for source PDF path, page index, bbox when available, block type, engine, confidence, warnings, asset paths, output locations, and chunk page ranges when chunking is active.
Always read PLAN.md, PROGRESS.md, PRD.md, ARCHITECTURE.md, and docs/V1IMPLEMENTATIONPLAN.md before working. Read docs/WORKARCHIVE.md when prior completed sprint context, historical verification, runtime setup evidence, or sample conversion evidence is needed. When a provenance/reporting sprint contract exists, read the relevant contract under docs/Sprints/ as well. For Sprint 3 domain records, metadata, and warning model work, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 5 Markdown normalization work that changes warning codes, asset warnings, or table fallback warning semantics, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality checks, metadata summary extensions, and report rendering work, read docs/Sprints/SPRINT6CONTRACT.md before changing quality.py, report.py, metadata.py, or report tests. For Sprint 7 conversion orchestration work that writes report Markdown, output paths, or asset provenance, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 9 fixture evaluation, report assertions, report quality gates, and release checklist work, read docs/Sprints/SPRINT9CONTRACT.md. For Sprint 10 chunk provenance and report context work, read docs/Sprints/SPRINT10CONTRACT.md. For Sprint 11 math repair provenance, warning summaries, or report consistency work, read docs/Sprints/SPRINT11CONTRACT.md. For Sprint 13 text fidelity diagnostics, pypdf comparison metrics, text warning codes, replacement candidate markers, and report sections, read docs/Sprints/SPRINT13CONTRACT.md. For Sprint 14 grouped metadata, page-conversion provenance, failed-page warnings, and report grouping behavior, read docs/Sprints/SPRINT14CONTRACT.md. For Sprint 15 GPU/profile provenance, read docs/Sprints/SPRINT15CONTRACT.md. For Sprint 16 simplified output layout, no public metadata JSON, shared images, and aggregate report behavior, read docs/Sprints/SPRINT16CONTRACT.md. Sprint 17 installer manifest and doctor report provenance work is abandoned. Read docs/Sprints/SPRINT17CONTRACT.md and docs/superpowers/plans/2026-05-12-offline-installer.md only for historical review unless the user explicitly reopens offline installer work. Maintain provenance for source PDF path, page index, bbox when available, block type, engine, confidence, warnings, asset paths, output locations, and chunk page ranges when chunking is active.
Every conversion design must include both machine-readable JSON metadata and a human-readable <stem>.report.md. Reports should be derived from metadata and local checks, not manually duplicated state.
Every new conversion design must include internal provenance and a human-readable <stem>_report.md. Do not require a public metadata JSON sidecar unless a future sprint explicitly restores one. Reports should be derived from internal provenance and local checks, not manually duplicated state.
Do not implement converter code unless explicitly asked. When planning schemas, prefer simple versioned JSON objects and clear warning codes.
"""