add files
This commit is contained in:
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"project": "PDFtoMD",
|
||||
"phase": "6-cli-runtime-resume",
|
||||
"steps": [
|
||||
{
|
||||
"step": 0,
|
||||
"name": "cli-entrypoint-options",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 1,
|
||||
"name": "progress-logging",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 2,
|
||||
"name": "resume-state",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 3,
|
||||
"name": "device-oom-policy",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 4,
|
||||
"name": "model-cache-offline",
|
||||
"status": "pending"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,38 @@
|
||||
# Step 0: cli-entrypoint-options
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /phases/1-core-runtime-contracts/index.json
|
||||
- /phases/5-markdown-rendering-assets/index.json
|
||||
|
||||
## Task
|
||||
Implement the `python -m pdftomd` CLI entrypoint and option parsing over the existing library API.
|
||||
|
||||
Expose input PDF, output directory, formula parser mode, Nougat command, runtime/device, chunk size, logging, and resume options.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: CLI options map into typed conversion options and can run against a mocked pipeline in tests.
|
||||
- Hard thresholds: CLI does not duplicate conversion logic; defaults match docs; explicit `cuda` and `auto` modes are represented.
|
||||
- Files owned: `src/pdftomd/__main__.py`, CLI modules/tests, `README.md` if command docs change, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Core contracts and renderer pipeline.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm CLI help text shows documented options.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not put parser logic inside CLI parsing code.
|
||||
- Do not implement PyQt UI.
|
||||
- Do not silently CPU fallback for explicit CUDA mode.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Step 1: progress-logging
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/6-cli-runtime-resume/step0.md
|
||||
|
||||
## Task
|
||||
Implement progress reporting and stderr/local log behavior for chunk-level conversion.
|
||||
|
||||
Progress should summarize chunk success/failure without writing warnings or errors into Markdown content.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: CLI/runtime tests can observe progress events and log file output in temp locations.
|
||||
- Hard thresholds: Markdown chunks remain free of warning/error logs; failure summaries include chunk ids; logs use deterministic local paths from Phase 1.
|
||||
- Files owned: `src/pdftomd/runtime.py`, CLI integration/tests, `PROGRESS.md`, phase index.
|
||||
- Dependencies: CLI entrypoint and output/cache contracts.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm stderr/log behavior is tested separately from Markdown output.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not write runtime logs inside generated Markdown.
|
||||
- Do not require a real PDF conversion for progress unit tests.
|
||||
- Do not create persistent logs outside temp dirs in tests.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Step 2: resume-state
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/6-cli-runtime-resume/step1.md
|
||||
|
||||
## Task
|
||||
Implement runtime resume state for successful and failed chunks.
|
||||
|
||||
Resume state is a runtime artifact, not a document output sidecar.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: conversion can skip completed chunks and retry failed chunks using a local state file in tests.
|
||||
- Hard thresholds: state format is deterministic; stale state is detected; resume does not skip chunks when input/options changed materially.
|
||||
- Files owned: `src/pdftomd/resume.py`, runtime integration/tests, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Progress/logging and chunk renderer contracts.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm state files are written only under temp/runtime cache paths in tests.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not treat resume state as part of generated document output.
|
||||
- Do not skip chunks after parser/version-relevant option changes.
|
||||
- Do not create hidden global state.
|
||||
@@ -0,0 +1,39 @@
|
||||
# Step 3: device-oom-policy
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /docs/TOOLCHAIN.md
|
||||
- /phases/1-core-runtime-contracts/step1.md
|
||||
|
||||
## Task
|
||||
Implement runtime device selection, CUDA fail-fast behavior, auto CPU fallback behavior, and OOM retry policy hooks.
|
||||
|
||||
This step should be tested with mocks and small CUDA smoke checks only where safe.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: runtime policy enforces explicit CUDA fail-fast, auto fallback warning, and configurable OOM retry reductions.
|
||||
- Hard thresholds: no silent CPU fallback for explicit CUDA; tests do not require exhausting VRAM; GTX 1070 Ti constraints remain documented.
|
||||
- Files owned: `src/pdftomd/runtime.py`, tests, `docs/TOOLCHAIN.md` if behavior changes, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Runtime config options.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm CUDA smoke test instructions still work separately.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not intentionally trigger real GPU OOM in tests.
|
||||
- Do not change PyTorch pins without updating `docs/TOOLCHAIN.md`.
|
||||
- Do not hide runtime warnings.
|
||||
@@ -0,0 +1,38 @@
|
||||
# Step 4: model-cache-offline
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/TOOLCHAIN.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /phases/6-cli-runtime-resume/step3.md
|
||||
|
||||
## Task
|
||||
Document and wire model cache/offline behavior for Marker, Nougat, and Hugging Face cache paths.
|
||||
|
||||
Add CLI/runtime hooks for environment variables or explicit cache paths without downloading models during tests.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: users can see how to pre-download models and run offline, and runtime cache paths are configurable.
|
||||
- Hard thresholds: no test performs network download; docs include Windows commands; cache path policy matches Phase 1.
|
||||
- Files owned: `src/pdftomd/runtime.py`, `README.md`, `docs/TOOLCHAIN.md`, tests, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Device/runtime policy and cache contracts.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm offline instructions are clear and do not imply bundled weights.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not download model weights as part of tests.
|
||||
- Do not commit model caches.
|
||||
- Do not make online access mandatory for already-cached models.
|
||||
Reference in New Issue
Block a user