add files

This commit is contained in:
김경종
2026-04-30 17:05:19 +09:00
parent f3e01b5a8c
commit 7e985ae94a
135 changed files with 41205 additions and 0 deletions
@@ -0,0 +1,30 @@
{
"project": "PDFtoMD",
"phase": "1-core-runtime-contracts",
"steps": [
{
"step": 0,
"name": "input-normalization-slug",
"status": "completed",
"summary": "Added deterministic PDF path normalization, document identity creation, anchors, and output bundle path contracts."
},
{
"step": 1,
"name": "conversion-options-config",
"status": "completed",
"summary": "Added typed conversion options with runtime mode and formula parser defaults matching project policy."
},
{
"step": 2,
"name": "output-bundle-contract",
"status": "completed",
"summary": "Added deterministic output bundle paths and separated runtime artifact paths from document output."
},
{
"step": 3,
"name": "runtime-cache-policy",
"status": "completed",
"summary": "Added model cache and runtime artifact path policies with explicit offline environment mappings."
}
]
}
+38
View File
@@ -0,0 +1,38 @@
# Step 0: input-normalization-slug
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /phases/0-harness-foundation/index.json
## Task
Implement deterministic input normalization and document slug generation for local PDF paths.
Cover `pathlib` handling for Korean filenames, spaces, relative paths, absolute paths, and long Windows paths. The API should not invoke Marker, Nougat, PyMuPDF, or any conversion logic.
## Sprint Contract
- Done means: the core package has a tested function or small module that normalizes input PDF paths and produces stable document slugs.
- Hard thresholds: same input path and options produce the same slug; non-PDF paths fail clearly; Korean and spaced paths are tested; no parser import is introduced.
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
- Dependencies: Phase 0 package skeleton and model contracts.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm `PROGRESS.md` records the handoff and validation result.
3. Update this phase index step to `completed`, `blocked`, or `error`.
## Do Not
- Do not implement PDF parsing.
- Do not write conversion output.
- Do not add UI code.
+38
View File
@@ -0,0 +1,38 @@
# Step 1: conversion-options-config
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/ADR.md
- /phases/1-core-runtime-contracts/step0.md
## Task
Define the typed conversion options and runtime configuration used by CLI, library, parser adapters, renderer, and UI.
Include runtime mode, device behavior, chunk target pages, formula parser mode, Nougat command path, output directory, model cache location, and resume/log options.
## Sprint Contract
- Done means: conversion options have defaults matching project policy and can be constructed by tests without CLI parsing.
- Hard thresholds: explicit `cuda` fail-fast semantics and `auto` fallback semantics are represented; Nougat remains formula-only; PyQt and hosted API options are not introduced.
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
- Dependencies: Step 0 normalized path/slug contract.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm defaults align with `docs/ARCHITECTURE.md` and `docs/CONVERSION_POLICY.md`.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not add command-line parsing yet.
- Do not initialize CUDA, Marker, or Nougat.
- Do not add external API settings.
+39
View File
@@ -0,0 +1,39 @@
# Step 2: output-bundle-contract
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /phases/1-core-runtime-contracts/step0.md
- /phases/1-core-runtime-contracts/step1.md
## Task
Define deterministic output bundle path rules for chunk Markdown files, image assets, anchors, and runtime artifacts.
This is a contract step. It may include lightweight path helpers and tests, but it should not render Markdown or write parsed document content.
## Sprint Contract
- Done means: output directory, chunk file names, image asset names, and runtime log/state locations are modeled and tested.
- Hard thresholds: document output sidecars remain out of scope; runtime logs/state are separated from Markdown bundle output; asset naming is deterministic.
- Files owned: `src/pdftomd/`, `tests/`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
- Dependencies: Steps 0 and 1.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm generated path contracts match `docs/ARCHITECTURE.md`.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not implement the renderer.
- Do not write files under `output/` in tests unless using a temp directory.
- Do not create sidecar metadata output.
+39
View File
@@ -0,0 +1,39 @@
# Step 3: runtime-cache-policy
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/TOOLCHAIN.md
- /docs/CONVERSION_POLICY.md
- /phases/1-core-runtime-contracts/step1.md
- /phases/1-core-runtime-contracts/step2.md
## Task
Establish model cache, log path, and resume state policy as typed contracts and documented path helpers.
The result should prepare later CLI/runtime phases to use local model cache paths and offline-preferred model loading.
## Sprint Contract
- Done means: model cache and runtime cache path contracts are tested and documented without downloading models.
- Hard thresholds: no network download is triggered; logs/state remain outside generated Markdown content; environment variable overrides are deterministic.
- Files owned: `src/pdftomd/`, `tests/`, `docs/TOOLCHAIN.md`, `PROGRESS.md`, `phases/1-core-runtime-contracts/index.json`.
- Dependencies: Steps 1 and 2.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm `docs/TOOLCHAIN.md` stays consistent with any cache path decisions.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not download Marker or Nougat weights.
- Do not add hosted storage or cloud cache behavior.
- Do not write warnings into Markdown output.