# Step 3: markdown-quality-gates ## Read First - /AGENTS.md - /PLAN.md - /PROGRESS.md - /docs/HARNESS.md - /docs/ARCHITECTURE.md - /docs/CONVERSION_POLICY.md - /docs/ADR.md - /phases/0-harness-foundation/step0.md - /phases/0-harness-foundation/step1.md - /phases/0-harness-foundation/step2.md - /phases/0-harness-foundation/index.json ## Task Create focused Markdown quality gate functions that later renderer and conversion steps can call. This step should validate generated Markdown-like strings and asset references without requiring a full PDF conversion. It should prefer structured checks over full snapshot comparison. Quality gates should cover: - math delimiter balance for `$...$` and `$$...$$` - LaTeX `\begin{...}` / `\end{...}` pairs - image link path existence or modeled asset reference existence - table parseability for simple Markdown tables - chunk frontmatter fields required by the output contract - caption/reference anchor shape where confidence is sufficient ## Sprint Contract - Done means: later renderer steps have reusable validation functions and focused pytest coverage for Markdown output risks. - Hard thresholds: - Tests include passing and failing examples for math delimiter checks. - Tests include a complex table case where Markdown limitations are represented as an allowed HTML/fallback decision. - Tests do not rely on full Markdown snapshot equality. - Validation functions do not mutate generated Markdown silently unless an explicit repair function is named and tested. - No PDF parsing or renderer implementation is introduced here. - Files owned: - `src/pdftomd/quality.py` - model additions in `src/pdftomd/models.py` only if required - `tests/test_quality.py` - `PROGRESS.md` - `phases/0-harness-foundation/index.json` - Dependencies: - Step 1 package skeleton and models ## Acceptance Criteria ```powershell python scripts\validate_workspace.py .\venv\python.exe -m pytest tests\test_quality.py ``` ## Verification 1. Run the acceptance commands. 2. Confirm quality gates are focused assertions, not whole-document snapshots. 3. Confirm failures return actionable messages for evaluator use. 4. Update `PROGRESS.md` with completed work, validation output, and next handoff. 5. Update this phase index step to `completed` with a one-line `summary`, or to `blocked`/`error` with a concrete reason. ## Do Not - Do not implement Marker/Nougat adapters. - Do not implement the full Markdown renderer. - Do not introduce an LLM correction path. - Do not write warning/error messages into generated Markdown content.