add files
This commit is contained in:
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"project": "PDFtoMD",
|
||||
"phase": "7-mvp-quality-hardening",
|
||||
"steps": [
|
||||
{
|
||||
"step": 0,
|
||||
"name": "sample-smoke-conversions",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 1,
|
||||
"name": "quality-metrics-report",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 2,
|
||||
"name": "regression-thresholds",
|
||||
"status": "pending"
|
||||
},
|
||||
{
|
||||
"step": 3,
|
||||
"name": "mvp-fix-sweep",
|
||||
"status": "pending"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,38 @@
|
||||
# Step 0: sample-smoke-conversions
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/PRD.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /phases/6-cli-runtime-resume/index.json
|
||||
|
||||
## Task
|
||||
Create controlled sample smoke conversion tests for the MVP corpus.
|
||||
|
||||
The tests should exercise the end-to-end pipeline on a small selected subset or page range first, then document which full documents are suitable for manual or slower regression runs.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: at least one text-layer sample and one mixed/scanned-risk sample can be converted in a controlled test path.
|
||||
- Hard thresholds: tests have runtime bounds; sample selection comes from `samples/metadata.json`; generated output is checked with quality gates.
|
||||
- Files owned: `tests/`, sample metadata updates if needed, `PROGRESS.md`, `phases/7-mvp-quality-hardening/index.json`.
|
||||
- Dependencies: CLI/runtime and renderer phases complete.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Record sample coverage and any skipped slow tests in `PROGRESS.md`.
|
||||
3. Update this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not make every validation run process all long PDFs if runtime becomes impractical.
|
||||
- Do not commit generated `output/` bundles.
|
||||
- Do not weaken quality gates to pass broken output.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Step 1: quality-metrics-report
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/PRD.md
|
||||
- /phases/7-mvp-quality-hardening/step0.md
|
||||
|
||||
## Task
|
||||
Add focused quality metrics for converted Markdown bundles.
|
||||
|
||||
Metrics should cover headings, math delimiter balance, LaTeX environment pairs, image links, captions, table parseability, chunk frontmatter, and no-exception conversion.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: evaluator-friendly quality metrics can be run on sample outputs and produce actionable failure messages.
|
||||
- Hard thresholds: metrics do not rely on full Markdown snapshots; failures identify file/chunk/block context; reports stay out of generated Markdown.
|
||||
- Files owned: `src/pdftomd/quality.py`, `tests/`, optional scripts under `scripts/`, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Step 0 sample smoke conversions and quality gates.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm metrics can be used by `harness-review`.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not create broad snapshot baselines as the primary quality gate.
|
||||
- Do not write quality reports inside Markdown chunks.
|
||||
- Do not hide per-chunk failures.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Step 2: regression-thresholds
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/PRD.md
|
||||
- /phases/7-mvp-quality-hardening/step1.md
|
||||
|
||||
## Task
|
||||
Define MVP regression thresholds for the sample corpus.
|
||||
|
||||
Thresholds should distinguish mandatory fast validation from slower/manual quality checks.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: MVP pass/fail criteria are encoded in tests or documented commands and tied to sample metadata traits.
|
||||
- Hard thresholds: mandatory validation remains runnable on the local machine; slow tests are opt-in; failed quality areas are not masked.
|
||||
- Files owned: `tests/`, `scripts/`, sample metadata updates if needed, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Quality metrics report.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm slow tests are documented separately if needed.
|
||||
3. Update `PROGRESS.md` and this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not make local validation unusably slow.
|
||||
- Do not turn all failures into warnings.
|
||||
- Do not remove sample coverage for Korean paths or formulas.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Step 3: mvp-fix-sweep
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/IMPLEMENTATION_PLAN.md
|
||||
- /docs/PRD.md
|
||||
- /phases/7-mvp-quality-hardening/step2.md
|
||||
|
||||
## Task
|
||||
Run a focused MVP stabilization pass based on failing quality metrics and sample smoke tests.
|
||||
|
||||
This step should fix only defects revealed by prior acceptance criteria and should avoid feature expansion.
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: MVP fast validation and selected sample smoke conversions pass with documented residual risks.
|
||||
- Hard thresholds: fixes are test-backed; no new primary parser is introduced; out-of-scope UI/API/LLM features remain out of scope.
|
||||
- Files owned: failing modules and tests identified by prior phase output, `PROGRESS.md`, phase index.
|
||||
- Dependencies: Regression thresholds and quality reports.
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Record remaining quality risks in `PROGRESS.md`.
|
||||
3. Update this phase index.
|
||||
|
||||
## Do Not
|
||||
- Do not use this as a broad refactor step.
|
||||
- Do not add new major features.
|
||||
- Do not bypass failed quality gates without recording a blocker.
|
||||
Reference in New Issue
Block a user