Files
PDFToMD/phases/3-formula-pipeline/step2.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.3 KiB

Step 2: latex-validation-repair

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/IMPLEMENTATION_PLAN.md
  • /docs/CONVERSION_POLICY.md
  • /phases/0-harness-foundation/step3.md
  • /phases/3-formula-pipeline/step1.md

Task

Implement LaTeX and Markdown math validation for formula outputs, plus explicit repair helpers for safe cases.

Validation should cover delimiter balance and common \begin{...} / \end{...} pairs.

Sprint Contract

  • Done means: formula output validation returns actionable diagnostics and tested repairs for narrow, deterministic cases.
  • Hard thresholds: validation does not silently mutate math; unrepairable failures fall back to Marker text; delimiter tests include both inline and block math.
  • Files owned: src/pdftomd/formulas.py, src/pdftomd/quality.py, tests, PROGRESS.md, phase index.
  • Dependencies: Phase 0 quality gates and Step 1 Nougat adapter.

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests

Verification

  1. Run the acceptance commands.
  2. Confirm broken delimiter and environment examples are covered.
  3. Update PROGRESS.md and this phase index.

Do Not

  • Do not build a broad LaTeX parser from scratch.
  • Do not use LLM repair.
  • Do not hide validation failures.