Files
PDFToMD/phases/3-formula-pipeline/step0.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.2 KiB

Step 0: formula-block-detection

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/IMPLEMENTATION_PLAN.md
  • /docs/CONVERSION_POLICY.md
  • /phases/2-marker-adapter/step2.md

Task

Implement formula candidate detection from normalized Marker blocks.

Detect Marker equation blocks and text-pattern candidates while classifying inline versus block formulas based on block role and layout hints.

Sprint Contract

  • Done means: formula candidates are represented as internal objects ready for Nougat or Marker fallback.
  • Hard thresholds: ordinary currency-like dollar text is not blindly treated as math; inline/block distinction is tested; no Nougat invocation occurs yet.
  • Files owned: src/pdftomd/formulas.py, tests, PROGRESS.md, phases/3-formula-pipeline/index.json.
  • Dependencies: Phase 2 block normalization.

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests

Verification

  1. Run the acceptance commands.
  2. Confirm tests include inline and block formula candidates.
  3. Update PROGRESS.md and this phase index.

Do Not

  • Do not call Nougat.
  • Do not render Markdown math.
  • Do not make regex the only source when structured block role exists.