--- name: conversion-architecture description: Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design. --- # Conversion Architecture ## Workflow 1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/ARCHITECTURE.md`, `docs/CONVERSION_POLICY.md`, and `docs/ADR.md`. 2. Keep responsibilities stable: - Marker: layout, OCR, reading order, body, headings, tables, figures, captions - Nougat: formula-only LaTeX parsing - PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning 3. Define interfaces and invariants before implementation. 4. Keep output deterministic and chunked under the documented output contract. 5. Record architecture changes in `docs/ADR.md` when decisions change. ## Guardrails - Do not place conversion logic in a future PyQt UI. - Do not add document sidecars unless explicitly requested. - Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.