Files
PDFToMD/.codex/skills/conversion-architecture/SKILL.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.1 KiB

name, description
name description
conversion-architecture Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design.

Conversion Architecture

Workflow

  1. Read AGENTS.md, PLAN.md, PROGRESS.md, docs/ARCHITECTURE.md, docs/CONVERSION_POLICY.md, and docs/ADR.md.
  2. Keep responsibilities stable:
    • Marker: layout, OCR, reading order, body, headings, tables, figures, captions
    • Nougat: formula-only LaTeX parsing
    • PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning
  3. Define interfaces and invariants before implementation.
  4. Keep output deterministic and chunked under the documented output contract.
  5. Record architecture changes in docs/ADR.md when decisions change.

Guardrails

  • Do not place conversion logic in a future PyQt UI.
  • Do not add document sidecars unless explicitly requested.
  • Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.