Files
PDFToMD/phases/3-formula-pipeline/step1.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.4 KiB

Step 1: nougat-command-adapter

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/IMPLEMENTATION_PLAN.md
  • /docs/TOOLCHAIN.md
  • /docs/CONVERSION_POLICY.md
  • /phases/3-formula-pipeline/step0.md

Task

Implement the Nougat formula-only adapter boundary.

The adapter should accept formula candidates and return LaTeX candidates or structured failure results. It should support a configured Nougat command path and be mockable in unit tests.

Sprint Contract

  • Done means: Nougat execution is isolated behind a testable command adapter and never becomes the primary document parser.
  • Hard thresholds: failures preserve Marker fallback text; tests do not require GPU/model execution by default; command path handling works on Windows.
  • Files owned: src/pdftomd/formulas.py, optional src/pdftomd/nougat_adapter.py, tests, PROGRESS.md, phase index.
  • Dependencies: Step 0 formula candidates and Phase 1 options.

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests

Verification

  1. Run the acceptance commands.
  2. Confirm .\venv\Scripts\nougat.exe --help remains documented as an environment check, not a unit-test requirement.
  3. Update PROGRESS.md and this phase index.

Do Not

  • Do not parse whole PDFs with Nougat.
  • Do not require model downloads for normal unit tests.
  • Do not discard Marker source text on failure.