add files
This commit is contained in:
@@ -0,0 +1,63 @@
|
||||
# Step 3: markdown-quality-gates
|
||||
|
||||
## Read First
|
||||
- /AGENTS.md
|
||||
- /PLAN.md
|
||||
- /PROGRESS.md
|
||||
- /docs/HARNESS.md
|
||||
- /docs/ARCHITECTURE.md
|
||||
- /docs/CONVERSION_POLICY.md
|
||||
- /docs/ADR.md
|
||||
- /phases/0-harness-foundation/step0.md
|
||||
- /phases/0-harness-foundation/step1.md
|
||||
- /phases/0-harness-foundation/step2.md
|
||||
- /phases/0-harness-foundation/index.json
|
||||
|
||||
## Task
|
||||
Create focused Markdown quality gate functions that later renderer and conversion steps can call.
|
||||
|
||||
This step should validate generated Markdown-like strings and asset references without requiring a full PDF conversion. It should prefer structured checks over full snapshot comparison.
|
||||
|
||||
Quality gates should cover:
|
||||
- math delimiter balance for `$...$` and `$$...$$`
|
||||
- LaTeX `\begin{...}` / `\end{...}` pairs
|
||||
- image link path existence or modeled asset reference existence
|
||||
- table parseability for simple Markdown tables
|
||||
- chunk frontmatter fields required by the output contract
|
||||
- caption/reference anchor shape where confidence is sufficient
|
||||
|
||||
## Sprint Contract
|
||||
- Done means: later renderer steps have reusable validation functions and focused pytest coverage for Markdown output risks.
|
||||
- Hard thresholds:
|
||||
- Tests include passing and failing examples for math delimiter checks.
|
||||
- Tests include a complex table case where Markdown limitations are represented as an allowed HTML/fallback decision.
|
||||
- Tests do not rely on full Markdown snapshot equality.
|
||||
- Validation functions do not mutate generated Markdown silently unless an explicit repair function is named and tested.
|
||||
- No PDF parsing or renderer implementation is introduced here.
|
||||
- Files owned:
|
||||
- `src/pdftomd/quality.py`
|
||||
- model additions in `src/pdftomd/models.py` only if required
|
||||
- `tests/test_quality.py`
|
||||
- `PROGRESS.md`
|
||||
- `phases/0-harness-foundation/index.json`
|
||||
- Dependencies:
|
||||
- Step 1 package skeleton and models
|
||||
|
||||
## Acceptance Criteria
|
||||
```powershell
|
||||
python scripts\validate_workspace.py
|
||||
.\venv\python.exe -m pytest tests\test_quality.py
|
||||
```
|
||||
|
||||
## Verification
|
||||
1. Run the acceptance commands.
|
||||
2. Confirm quality gates are focused assertions, not whole-document snapshots.
|
||||
3. Confirm failures return actionable messages for evaluator use.
|
||||
4. Update `PROGRESS.md` with completed work, validation output, and next handoff.
|
||||
5. Update this phase index step to `completed` with a one-line `summary`, or to `blocked`/`error` with a concrete reason.
|
||||
|
||||
## Do Not
|
||||
- Do not implement Marker/Nougat adapters.
|
||||
- Do not implement the full Markdown renderer.
|
||||
- Do not introduce an LLM correction path.
|
||||
- Do not write warning/error messages into generated Markdown content.
|
||||
Reference in New Issue
Block a user