Files
PDFToMD/phases/1-core-runtime-contracts/step0.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.4 KiB

Step 0: input-normalization-slug

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/IMPLEMENTATION_PLAN.md
  • /docs/ARCHITECTURE.md
  • /docs/CONVERSION_POLICY.md
  • /phases/0-harness-foundation/index.json

Task

Implement deterministic input normalization and document slug generation for local PDF paths.

Cover pathlib handling for Korean filenames, spaces, relative paths, absolute paths, and long Windows paths. The API should not invoke Marker, Nougat, PyMuPDF, or any conversion logic.

Sprint Contract

  • Done means: the core package has a tested function or small module that normalizes input PDF paths and produces stable document slugs.
  • Hard thresholds: same input path and options produce the same slug; non-PDF paths fail clearly; Korean and spaced paths are tested; no parser import is introduced.
  • Files owned: src/pdftomd/, tests/, PROGRESS.md, phases/1-core-runtime-contracts/index.json.
  • Dependencies: Phase 0 package skeleton and model contracts.

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests

Verification

  1. Run the acceptance commands.
  2. Confirm PROGRESS.md records the handoff and validation result.
  3. Update this phase index step to completed, blocked, or error.

Do Not

  • Do not implement PDF parsing.
  • Do not write conversion output.
  • Do not add UI code.