Files
PDFToMD/phases/0-harness-foundation/step3.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

2.5 KiB

Step 3: markdown-quality-gates

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/ARCHITECTURE.md
  • /docs/CONVERSION_POLICY.md
  • /docs/ADR.md
  • /phases/0-harness-foundation/step0.md
  • /phases/0-harness-foundation/step1.md
  • /phases/0-harness-foundation/step2.md
  • /phases/0-harness-foundation/index.json

Task

Create focused Markdown quality gate functions that later renderer and conversion steps can call.

This step should validate generated Markdown-like strings and asset references without requiring a full PDF conversion. It should prefer structured checks over full snapshot comparison.

Quality gates should cover:

  • math delimiter balance for $...$ and $$...$$
  • LaTeX \begin{...} / \end{...} pairs
  • image link path existence or modeled asset reference existence
  • table parseability for simple Markdown tables
  • chunk frontmatter fields required by the output contract
  • caption/reference anchor shape where confidence is sufficient

Sprint Contract

  • Done means: later renderer steps have reusable validation functions and focused pytest coverage for Markdown output risks.
  • Hard thresholds:
    • Tests include passing and failing examples for math delimiter checks.
    • Tests include a complex table case where Markdown limitations are represented as an allowed HTML/fallback decision.
    • Tests do not rely on full Markdown snapshot equality.
    • Validation functions do not mutate generated Markdown silently unless an explicit repair function is named and tested.
    • No PDF parsing or renderer implementation is introduced here.
  • Files owned:
    • src/pdftomd/quality.py
    • model additions in src/pdftomd/models.py only if required
    • tests/test_quality.py
    • PROGRESS.md
    • phases/0-harness-foundation/index.json
  • Dependencies:
    • Step 1 package skeleton and models

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests\test_quality.py

Verification

  1. Run the acceptance commands.
  2. Confirm quality gates are focused assertions, not whole-document snapshots.
  3. Confirm failures return actionable messages for evaluator use.
  4. Update PROGRESS.md with completed work, validation output, and next handoff.
  5. Update this phase index step to completed with a one-line summary, or to blocked/error with a concrete reason.

Do Not

  • Do not implement Marker/Nougat adapters.
  • Do not implement the full Markdown renderer.
  • Do not introduce an LLM correction path.
  • Do not write warning/error messages into generated Markdown content.