2.3 KiB
2.3 KiB
Step 1: core-package-skeleton
Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /docs/ADR.md
- /phases/0-harness-foundation/step0.md
- /phases/0-harness-foundation/index.json
Task
Create the minimal Python package skeleton and internal data contracts needed by later parser, pre-analysis, and renderer steps.
The skeleton should establish importable modules and typed models only. It should not call Marker, Nougat, PyMuPDF, OCR, CUDA, or the filesystem-heavy conversion path yet.
Suggested module boundary:
src/pdftomd/__init__.pysrc/pdftomd/models.pytests/test_models.py
The exact type names may differ if the local design suggests better names, but the contracts must represent document identity, page ranges, block roles, bounding boxes, assets, formulas, tables, figures, and chunk metadata.
Sprint Contract
- Done means: future steps have stable importable types for page analysis, block modeling, chunk metadata, and output assets.
- Hard thresholds:
- Tests cover model construction, deterministic slug/path-relevant fields, and page range invariants.
- Models do not depend on Marker, Nougat, PyMuPDF, torch, pandas, or PyQt.
- The package imports on Windows with
.\venv\python.exe. - Public contracts are documented by tests or clear docstrings.
- Files owned:
src/pdftomd/tests/test_models.pyPROGRESS.mdphases/0-harness-foundation/index.json
- Dependencies:
- Step 0 metadata should be complete or explicitly blocked.
Acceptance Criteria
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests\test_models.py
Verification
- Run the acceptance commands.
- Confirm package imports with
.\venv\python.exe -c "import pdftomd; print(pdftomd.__name__)". - Confirm no heavy parser/model imports are introduced.
- Update
PROGRESS.mdwith completed work, validation output, and next handoff. - Update this phase index step to
completedwith a one-linesummary, or toblocked/errorwith a concrete reason.
Do Not
- Do not implement actual PDF parsing.
- Do not run Marker or Nougat.
- Do not add CLI commands.
- Do not add PyQt UI code.
- Do not widen the output contract beyond
docs/ARCHITECTURE.md.