Files
PDFToMD/.codex/skills/conversion-architecture/SKILL.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

24 lines
1.1 KiB
Markdown

---
name: conversion-architecture
description: Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design.
---
# Conversion Architecture
## Workflow
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/ARCHITECTURE.md`, `docs/CONVERSION_POLICY.md`, and `docs/ADR.md`.
2. Keep responsibilities stable:
- Marker: layout, OCR, reading order, body, headings, tables, figures, captions
- Nougat: formula-only LaTeX parsing
- PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning
3. Define interfaces and invariants before implementation.
4. Keep output deterministic and chunked under the documented output contract.
5. Record architecture changes in `docs/ADR.md` when decisions change.
## Guardrails
- Do not place conversion logic in a future PyQt UI.
- Do not add document sidecars unless explicitly requested.
- Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.