1.1 KiB
1.1 KiB
name, description
| name | description |
|---|---|
| conversion-architecture | Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design. |
Conversion Architecture
Workflow
- Read
AGENTS.md,PLAN.md,PROGRESS.md,docs/ARCHITECTURE.md,docs/CONVERSION_POLICY.md, anddocs/ADR.md. - Keep responsibilities stable:
- Marker: layout, OCR, reading order, body, headings, tables, figures, captions
- Nougat: formula-only LaTeX parsing
- PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning
- Define interfaces and invariants before implementation.
- Keep output deterministic and chunked under the documented output contract.
- Record architecture changes in
docs/ADR.mdwhen decisions change.
Guardrails
- Do not place conversion logic in a future PyQt UI.
- Do not add document sidecars unless explicitly requested.
- Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.