add files
This commit is contained in:
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"project": "PDFtoMD",
|
||||
"phase": "1-core-runtime-contracts",
|
||||
"steps": [
|
||||
{
|
||||
"step": 0,
|
||||
"name": "input-normalization-slug",
|
||||
"status": "completed",
|
||||
"summary": "Added deterministic PDF path normalization, document identity creation, anchors, and output bundle path contracts."
|
||||
},
|
||||
{
|
||||
"step": 1,
|
||||
"name": "conversion-options-config",
|
||||
"status": "completed",
|
||||
"summary": "Added typed conversion options with runtime mode and formula parser defaults matching project policy."
|
||||
},
|
||||
{
|
||||
"step": 2,
|
||||
"name": "output-bundle-contract",
|
||||
"status": "completed",
|
||||
"summary": "Added deterministic output bundle paths and separated runtime artifact paths from document output."
|
||||
},
|
||||
{
|
||||
"step": 3,
|
||||
"name": "runtime-cache-policy",
|
||||
"status": "completed",
|
||||
"summary": "Added model cache and runtime artifact path policies with explicit offline environment mappings."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,38 @@
|
||||
# Step 0: input-normalization-slug
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/0-harness-foundation/index.json
|
||||
|
||||
## Task
|
||||
Implement deterministic input normalization and document slug generation for local PDF paths.
|
||||
|
||||
Cover `pathlib` handling for Korean filenames, spaces, relative paths, absolute paths, and long Windows paths. The API should not invoke Marker, Nougat, PyMuPDF, or any conversion logic.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: the core package has a tested function or small module that normalizes input PDF paths and produces stable document slugs.
|
||||
- Hard thresholds: same input path and options produce the same slug; non-PDF paths fail clearly; Korean and spaced paths are tested; no parser import is introduced.
|
||||
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
|
||||
- Dependencies: Phase 0 package skeleton and model contracts.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm `PROGRESS.md` records the handoff and validation result.
|
||||
3. Update this phase index step to `completed`, `blocked`, or `error`.
|
||||
|
||||
## Do Not
|
||||
- Do not implement PDF parsing.
|
||||
- Do not write conversion output.
|
||||
- Do not add UI code.
|
||||
@@ -0,0 +1,38 @@
|
||||
# Step 1: conversion-options-config
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /docs/ADR.md
|
||||
- /phases/1-core-runtime-contracts/step0.md
|
||||
|
||||
## Task
|
||||
Define the typed conversion options and runtime configuration used by CLI, library, parser adapters, renderer, and UI.
|
||||
|
||||
Include runtime mode, device behavior, chunk target pages, formula parser mode, Nougat command path, output directory, model cache location, and resume/log options.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: conversion options have defaults matching project policy and can be constructed by tests without CLI parsing.
|
||||
- Hard thresholds: explicit `cuda` fail-fast semantics and `auto` fallback semantics are represented; Nougat remains formula-only; PyQt and hosted API options are not introduced.
|
||||
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
|
||||
- Dependencies: Step 0 normalized path/slug contract.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm defaults align with `docs/ARCHITECTURE.md` and `docs/CONVERSION_POLICY.md`.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not add command-line parsing yet.
|
||||
- Do not initialize CUDA, Marker, or Nougat.
|
||||
- Do not add external API settings.
|
||||
@@ -0,0 +1,39 @@
|
||||
# Step 2: output-bundle-contract
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/1-core-runtime-contracts/step0.md
|
||||
- /phases/1-core-runtime-contracts/step1.md
|
||||
|
||||
## Task
|
||||
Define deterministic output bundle path rules for chunk Markdown files, image assets, anchors, and runtime artifacts.
|
||||
|
||||
This is a contract step. It may include lightweight path helpers and tests, but it should not render Markdown or write parsed document content.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: output directory, chunk file names, image asset names, and runtime log/state locations are modeled and tested.
|
||||
- Hard thresholds: document output sidecars remain out of scope; runtime logs/state are separated from Markdown bundle output; asset naming is deterministic.
|
||||
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
|
||||
- Dependencies: Steps 0 and 1.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm generated path contracts match `docs/ARCHITECTURE.md`.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not implement the renderer.
|
||||
- Do not write files under `output/` in tests unless using a temp directory.
|
||||
- Do not create sidecar metadata output.
|
||||
@@ -0,0 +1,39 @@
|
||||
# Step 3: runtime-cache-policy
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/TOOLCHAIN.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/1-core-runtime-contracts/step1.md
|
||||
- /phases/1-core-runtime-contracts/step2.md
|
||||
|
||||
## Task
|
||||
Establish model cache, log path, and resume state policy as typed contracts and documented path helpers.
|
||||
|
||||
The result should prepare later CLI/runtime phases to use local model cache paths and offline-preferred model loading.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: model cache and runtime cache path contracts are tested and documented without downloading models.
|
||||
- Hard thresholds: no network download is triggered; logs/state remain outside generated Markdown content; environment variable overrides are deterministic.
|
||||
- Files owned: `src/pdftomd/`, `tests/`, `docs/TOOLCHAIN.md`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
|
||||
- Dependencies: Steps 1 and 2.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm `docs/TOOLCHAIN.md` stays consistent with any cache path decisions.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not download Marker or Nougat weights.
|
||||
- Do not add hosted storage or cloud cache behavior.
|
||||
- Do not write warnings into Markdown output.
|
||||
Reference in New Issue
Block a user