add files

This commit is contained in:
김경종
2026-04-30 17:05:19 +09:00
parent f3e01b5a8c
commit 7e985ae94a
135 changed files with 41205 additions and 0 deletions
+31
View File
@@ -0,0 +1,31 @@
{
"project": "PDFtoMD",
"phase": "6-cli-runtime-resume",
"steps": [
{
"step": 0,
"name": "cli-entrypoint-options",
"status": "pending"
},
{
"step": 1,
"name": "progress-logging",
"status": "pending"
},
{
"step": 2,
"name": "resume-state",
"status": "pending"
},
{
"step": 3,
"name": "device-oom-policy",
"status": "pending"
},
{
"step": 4,
"name": "model-cache-offline",
"status": "pending"
}
]
}
+38
View File
@@ -0,0 +1,38 @@
# Step 0: cli-entrypoint-options
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /phases/1-core-runtime-contracts/index.json
- /phases/5-markdown-rendering-assets/index.json
## Task
Implement the `python -m pdftomd` CLI entrypoint and option parsing over the existing library API.
Expose input PDF, output directory, formula parser mode, Nougat command, runtime/device, chunk size, logging, and resume options.
## Sprint Contract
- Done means: CLI options map into typed conversion options and can run against a mocked pipeline in tests.
- Hard thresholds: CLI does not duplicate conversion logic; defaults match docs; explicit `cuda` and `auto` modes are represented.
- Files owned: `src/pdftomd/__main__.py`, CLI modules/tests, `README.md` if command docs change, `PROGRESS.md`, phase index.
- Dependencies: Core contracts and renderer pipeline.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm CLI help text shows documented options.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not put parser logic inside CLI parsing code.
- Do not implement PyQt UI.
- Do not silently CPU fallback for explicit CUDA mode.
+37
View File
@@ -0,0 +1,37 @@
# Step 1: progress-logging
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/CONVERSION_POLICY.md
- /phases/6-cli-runtime-resume/step0.md
## Task
Implement progress reporting and stderr/local log behavior for chunk-level conversion.
Progress should summarize chunk success/failure without writing warnings or errors into Markdown content.
## Sprint Contract
- Done means: CLI/runtime tests can observe progress events and log file output in temp locations.
- Hard thresholds: Markdown chunks remain free of warning/error logs; failure summaries include chunk ids; logs use deterministic local paths from Phase 1.
- Files owned: `src/pdftomd/runtime.py`, CLI integration/tests, `PROGRESS.md`, phase index.
- Dependencies: CLI entrypoint and output/cache contracts.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm stderr/log behavior is tested separately from Markdown output.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not write runtime logs inside generated Markdown.
- Do not require a real PDF conversion for progress unit tests.
- Do not create persistent logs outside temp dirs in tests.
+37
View File
@@ -0,0 +1,37 @@
# Step 2: resume-state
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/CONVERSION_POLICY.md
- /phases/6-cli-runtime-resume/step1.md
## Task
Implement runtime resume state for successful and failed chunks.
Resume state is a runtime artifact, not a document output sidecar.
## Sprint Contract
- Done means: conversion can skip completed chunks and retry failed chunks using a local state file in tests.
- Hard thresholds: state format is deterministic; stale state is detected; resume does not skip chunks when input/options changed materially.
- Files owned: `src/pdftomd/resume.py`, runtime integration/tests, `PROGRESS.md`, phase index.
- Dependencies: Progress/logging and chunk renderer contracts.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm state files are written only under temp/runtime cache paths in tests.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not treat resume state as part of generated document output.
- Do not skip chunks after parser/version-relevant option changes.
- Do not create hidden global state.
+39
View File
@@ -0,0 +1,39 @@
# Step 3: device-oom-policy
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /docs/TOOLCHAIN.md
- /phases/1-core-runtime-contracts/step1.md
## Task
Implement runtime device selection, CUDA fail-fast behavior, auto CPU fallback behavior, and OOM retry policy hooks.
This step should be tested with mocks and small CUDA smoke checks only where safe.
## Sprint Contract
- Done means: runtime policy enforces explicit CUDA fail-fast, auto fallback warning, and configurable OOM retry reductions.
- Hard thresholds: no silent CPU fallback for explicit CUDA; tests do not require exhausting VRAM; GTX 1070 Ti constraints remain documented.
- Files owned: `src/pdftomd/runtime.py`, tests, `docs/TOOLCHAIN.md` if behavior changes, `PROGRESS.md`, phase index.
- Dependencies: Runtime config options.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm CUDA smoke test instructions still work separately.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not intentionally trigger real GPU OOM in tests.
- Do not change PyTorch pins without updating `docs/TOOLCHAIN.md`.
- Do not hide runtime warnings.
+38
View File
@@ -0,0 +1,38 @@
# Step 4: model-cache-offline
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/TOOLCHAIN.md
- /docs/ARCHITECTURE.md
- /phases/6-cli-runtime-resume/step3.md
## Task
Document and wire model cache/offline behavior for Marker, Nougat, and Hugging Face cache paths.
Add CLI/runtime hooks for environment variables or explicit cache paths without downloading models during tests.
## Sprint Contract
- Done means: users can see how to pre-download models and run offline, and runtime cache paths are configurable.
- Hard thresholds: no test performs network download; docs include Windows commands; cache path policy matches Phase 1.
- Files owned: `src/pdftomd/runtime.py`, `README.md`, `docs/TOOLCHAIN.md`, tests, `PROGRESS.md`, phase index.
- Dependencies: Device/runtime policy and cache contracts.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm offline instructions are clear and do not imply bundled weights.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not download model weights as part of tests.
- Do not commit model caches.
- Do not make online access mandatory for already-cached models.