add files

This commit is contained in:
김경종
2026-04-30 17:05:19 +09:00
parent f3e01b5a8c
commit 7e985ae94a
135 changed files with 41205 additions and 0 deletions
+26
View File
@@ -0,0 +1,26 @@
{
"project": "PDFtoMD",
"phase": "7-mvp-quality-hardening",
"steps": [
{
"step": 0,
"name": "sample-smoke-conversions",
"status": "pending"
},
{
"step": 1,
"name": "quality-metrics-report",
"status": "pending"
},
{
"step": 2,
"name": "regression-thresholds",
"status": "pending"
},
{
"step": 3,
"name": "mvp-fix-sweep",
"status": "pending"
}
]
}
+38
View File
@@ -0,0 +1,38 @@
# Step 0: sample-smoke-conversions
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/PRD.md
- /docs/CONVERSION_POLICY.md
- /phases/6-cli-runtime-resume/index.json
## Task
Create controlled sample smoke conversion tests for the MVP corpus.
The tests should exercise the end-to-end pipeline on a small selected subset or page range first, then document which full documents are suitable for manual or slower regression runs.
## Sprint Contract
- Done means: at least one text-layer sample and one mixed/scanned-risk sample can be converted in a controlled test path.
- Hard thresholds: tests have runtime bounds; sample selection comes from `samples/metadata.json`; generated output is checked with quality gates.
- Files owned: `tests/`, sample metadata updates if needed, `PROGRESS.md`, `phases/7-mvp-quality-hardening/index.json`.
- Dependencies: CLI/runtime and renderer phases complete.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Record sample coverage and any skipped slow tests in `PROGRESS.md`.
3. Update this phase index.
## Do Not
- Do not make every validation run process all long PDFs if runtime becomes impractical.
- Do not commit generated `output/` bundles.
- Do not weaken quality gates to pass broken output.
+37
View File
@@ -0,0 +1,37 @@
# Step 1: quality-metrics-report
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/PRD.md
- /phases/7-mvp-quality-hardening/step0.md
## Task
Add focused quality metrics for converted Markdown bundles.
Metrics should cover headings, math delimiter balance, LaTeX environment pairs, image links, captions, table parseability, chunk frontmatter, and no-exception conversion.
## Sprint Contract
- Done means: evaluator-friendly quality metrics can be run on sample outputs and produce actionable failure messages.
- Hard thresholds: metrics do not rely on full Markdown snapshots; failures identify file/chunk/block context; reports stay out of generated Markdown.
- Files owned: `src/pdftomd/quality.py`, `tests/`, optional scripts under `scripts/`, `PROGRESS.md`, phase index.
- Dependencies: Step 0 sample smoke conversions and quality gates.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm metrics can be used by `harness-review`.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not create broad snapshot baselines as the primary quality gate.
- Do not write quality reports inside Markdown chunks.
- Do not hide per-chunk failures.
+37
View File
@@ -0,0 +1,37 @@
# Step 2: regression-thresholds
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/PRD.md
- /phases/7-mvp-quality-hardening/step1.md
## Task
Define MVP regression thresholds for the sample corpus.
Thresholds should distinguish mandatory fast validation from slower/manual quality checks.
## Sprint Contract
- Done means: MVP pass/fail criteria are encoded in tests or documented commands and tied to sample metadata traits.
- Hard thresholds: mandatory validation remains runnable on the local machine; slow tests are opt-in; failed quality areas are not masked.
- Files owned: `tests/`, `scripts/`, sample metadata updates if needed, `PROGRESS.md`, phase index.
- Dependencies: Quality metrics report.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Confirm slow tests are documented separately if needed.
3. Update `PROGRESS.md` and this phase index.
## Do Not
- Do not make local validation unusably slow.
- Do not turn all failures into warnings.
- Do not remove sample coverage for Korean paths or formulas.
+37
View File
@@ -0,0 +1,37 @@
# Step 3: mvp-fix-sweep
## Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/PRD.md
- /phases/7-mvp-quality-hardening/step2.md
## Task
Run a focused MVP stabilization pass based on failing quality metrics and sample smoke tests.
This step should fix only defects revealed by prior acceptance criteria and should avoid feature expansion.
## Sprint Contract
- Done means: MVP fast validation and selected sample smoke conversions pass with documented residual risks.
- Hard thresholds: fixes are test-backed; no new primary parser is introduced; out-of-scope UI/API/LLM features remain out of scope.
- Files owned: failing modules and tests identified by prior phase output, `PROGRESS.md`, phase index.
- Dependencies: Regression thresholds and quality reports.
## Acceptance Criteria
```powershell
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
```
## Verification
1. Run the acceptance commands.
2. Record remaining quality risks in `PROGRESS.md`.
3. Update this phase index.
## Do Not
- Do not use this as a broad refactor step.
- Do not add new major features.
- Do not bypass failed quality gates without recording a blocker.