# Step 1: ocr-plan-handoff ## Read First - /AGENTS.md - /PLAN.md - /PROGRESS.md - /docs/HARNESS.md - /docs/IMPLEMENTATION_PLAN.md - /docs/CONVERSION_POLICY.md - /phases/0-harness-foundation/step2.md - /phases/2-marker-adapter/step0.md ## Task Connect PyMuPDF page pre-analysis results to the Marker adapter as an OCR/layout handoff plan. The goal is to preserve page-level OCR decisions without making the entire document scan-only or text-only. ## Sprint Contract - Done means: the adapter accepts page-level OCR candidates and passes the relevant intent into Marker configuration or records an explicit unsupported-path fallback. - Hard thresholds: OCR decisions stay page-aware; PyMuPDF remains pre-analysis only; no OCR logs are inserted into Markdown. - Files owned: `src/pdftomd/marker_adapter.py`, `src/pdftomd/preanalysis.py` if needed, tests, `PROGRESS.md`, phase index. - Dependencies: Phase 0 pre-analysis and Step 0 Marker adapter. ## Acceptance Criteria ```powershell python scripts\validate_workspace.py .\venv\python.exe -m pytest tests ``` ## Verification 1. Run the acceptance commands. 2. Confirm mixed text/scanned sample traits are represented in tests. 3. Update `PROGRESS.md` and this phase index. ## Do Not - Do not force document-wide OCR when only selected pages need OCR. - Do not implement reading-order fixes here. - Do not add a second primary parser.