1.4 KiB
1.4 KiB
Step 0: input-normalization-slug
Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /phases/0-harness-foundation/index.json
Task
Implement deterministic input normalization and document slug generation for local PDF paths.
Cover pathlib handling for Korean filenames, spaces, relative paths, absolute paths, and long Windows paths. The API should not invoke Marker, Nougat, PyMuPDF, or any conversion logic.
Sprint Contract
- Done means: the core package has a tested function or small module that normalizes input PDF paths and produces stable document slugs.
- Hard thresholds: same input path and options produce the same slug; non-PDF paths fail clearly; Korean and spaced paths are tested; no parser import is introduced.
- Files owned:
src/pdftomd/,tests/,PROGRESS.md,phases/1-core-runtime-contracts/index.json. - Dependencies: Phase 0 package skeleton and model contracts.
Acceptance Criteria
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
Verification
- Run the acceptance commands.
- Confirm
PROGRESS.mdrecords the handoff and validation result. - Update this phase index step to
completed,blocked, orerror.
Do Not
- Do not implement PDF parsing.
- Do not write conversion output.
- Do not add UI code.