2.5 KiB
2.5 KiB
Step 3: markdown-quality-gates
Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /docs/ADR.md
- /phases/0-harness-foundation/step0.md
- /phases/0-harness-foundation/step1.md
- /phases/0-harness-foundation/step2.md
- /phases/0-harness-foundation/index.json
Task
Create focused Markdown quality gate functions that later renderer and conversion steps can call.
This step should validate generated Markdown-like strings and asset references without requiring a full PDF conversion. It should prefer structured checks over full snapshot comparison.
Quality gates should cover:
- math delimiter balance for
$...$and$$...$$ - LaTeX
\begin{...}/\end{...}pairs - image link path existence or modeled asset reference existence
- table parseability for simple Markdown tables
- chunk frontmatter fields required by the output contract
- caption/reference anchor shape where confidence is sufficient
Sprint Contract
- Done means: later renderer steps have reusable validation functions and focused pytest coverage for Markdown output risks.
- Hard thresholds:
- Tests include passing and failing examples for math delimiter checks.
- Tests include a complex table case where Markdown limitations are represented as an allowed HTML/fallback decision.
- Tests do not rely on full Markdown snapshot equality.
- Validation functions do not mutate generated Markdown silently unless an explicit repair function is named and tested.
- No PDF parsing or renderer implementation is introduced here.
- Files owned:
src/pdftomd/quality.py- model additions in
src/pdftomd/models.pyonly if required tests/test_quality.pyPROGRESS.mdphases/0-harness-foundation/index.json
- Dependencies:
- Step 1 package skeleton and models
Acceptance Criteria
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests\test_quality.py
Verification
- Run the acceptance commands.
- Confirm quality gates are focused assertions, not whole-document snapshots.
- Confirm failures return actionable messages for evaluator use.
- Update
PROGRESS.mdwith completed work, validation output, and next handoff. - Update this phase index step to
completedwith a one-linesummary, or toblocked/errorwith a concrete reason.
Do Not
- Do not implement Marker/Nougat adapters.
- Do not implement the full Markdown renderer.
- Do not introduce an LLM correction path.
- Do not write warning/error messages into generated Markdown content.