146 lines
8.6 KiB
Markdown
146 lines
8.6 KiB
Markdown
# V1 Release Checklist
|
|
|
|
Use this checklist from the repository root when deciding whether v1 is ready for local use. It separates the default fast gates from optional MinerU/GPU/sample validation so the normal loop stays independent of real models, GPU, network access, Obsidian, LaTeX tooling, and `samples/`.
|
|
|
|
## Release Status Rules
|
|
|
|
- Default fast gates passing means the repository is healthy under mocked/local checks.
|
|
- Optional local MinerU fixture gates passing means real local sample conversion has also been validated.
|
|
- If `pdf2md doctor` reports a hard failure, v1 release status is blocked unless the user records an explicit waiver with the exact reason and scope.
|
|
- Do not mark v1 fully validated when optional MinerU/sample checks are skipped or blocked. Record the blocked reason and release recommendation in `PROGRESS.md`.
|
|
- Do not claim perfect LaTeX, table, or reading-order reconstruction. The v1 guarantee is best-effort local conversion with warnings, metadata, and a human-readable report.
|
|
|
|
## Default Fast Gates
|
|
|
|
These gates should be runnable without real MinerU execution, sample PDFs, model files, GPU acceleration, network access, Obsidian, or LaTeX tooling. They must not install packages or download models during imports, tests, `doctor`, or `convert`.
|
|
|
|
| Gate | Pass condition | Release handling |
|
|
| --- | --- | --- |
|
|
| `uv sync` | Exits 0. | Blocks if project dependencies cannot sync. |
|
|
| `uv run pytest` | Exits 0 using mocked/local tests only. | Blocks if default tests require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or `samples/`. |
|
|
| `uv run pdf2md --version` | Exits 0 and prints the installed CLI version. | Blocks if the CLI entry point is unavailable. |
|
|
| `uv run pdf2md doctor` | Runs to completion and reports pass, warning-only, or hard-fail status. | A hard failure blocks release readiness unless explicitly waived by the user and recorded. Warning-only status can continue, but warnings must be recorded. |
|
|
| `git diff --check` | Exits 0. | Blocks on whitespace or patch formatting errors. |
|
|
| `git status --short --untracked-files=all` | Shows no staged sample PDFs or generated sample outputs. | Blocks if any `samples/` file or sample-derived output is staged or committed unintentionally. |
|
|
|
|
## Strict-Local Gate
|
|
|
|
Before calling v1 ready, verify the release candidate still follows the strict-local policy:
|
|
|
|
- Allowed runtime path: direct local `mineru` CLI execution.
|
|
- Allowed MinerU internal behavior: MinerU 3.1.0 may start a temporary local `mineru-api` when the CLI runs without `--api-url`.
|
|
- Prohibited runtime paths: `--api-url`, remote APIs, router mode, HTTP client backends, remote OpenAI-compatible backends, hosted renderers, cloud OCR, remote LLM/VLM calls, remote document parsers, alternate engines, runtime engine selection, and silent fallback after MinerU failure.
|
|
- Setup downloads must be explicit user actions and remain separate from runtime conversion. Default tests, imports, `doctor`, `convert`, and helper checks must not download packages or models.
|
|
|
|
## Doctor Hard-Failure Handling
|
|
|
|
Treat a non-zero `pdf2md doctor` result as a release blocker for real v1 readiness. Common hard failures include missing required local dependencies, missing `mineru` on PATH, or a strict-local policy failure. Warning-only doctor results can continue, but the warnings must be recorded.
|
|
|
|
When `doctor` hard-fails:
|
|
|
|
- Do not run optional sample conversion as a passing release gate.
|
|
- Do not mark optional MinerU/GPU/sample validation as skipped-pass. Mark it blocked.
|
|
- Do not use a cloud/API fallback or alternate converter to bypass the failure.
|
|
- Record the failing check, exit code, and next action in `PROGRESS.md`.
|
|
- If the user chooses to proceed anyway, record the waiver and report the release as waived or risk-accepted, not fully validated.
|
|
|
|
## Optional Local MinerU, GPU, And Sample Gates
|
|
|
|
Run these only by explicit opt-in after the default gates. They are intended for a local workstation with MinerU 3.1.0, model/cache setup, and GPU/PyTorch expectations already checked by `doctor`.
|
|
|
|
Preconditions:
|
|
|
|
- `uv run pdf2md doctor` has no hard failures, or the user has recorded an explicit waiver.
|
|
- Source PDFs come from local `samples/` or another user-provided local directory.
|
|
- Generated outputs go to a temporary directory or another ignored local output root, never to tracked fixture paths.
|
|
- Sample PDFs, `samples/metadata.json`, and generated sample outputs remain unstaged and uncommitted.
|
|
|
|
Manual single-sample shape:
|
|
|
|
```powershell
|
|
$sample = "samples\FourNodeQuadrilateralShellElementMITC4.pdf"
|
|
$out = Join-Path $env:TEMP ("pdf2md-fixture-" + [guid]::NewGuid().ToString())
|
|
uv run pdf2md convert $sample --out $out --metadata --overwrite
|
|
```
|
|
|
|
Optional pytest fixture evaluation:
|
|
|
|
```powershell
|
|
$env:PDF2MD_RUN_MINERU_FIXTURES = "1"
|
|
uv run pytest tests/integration/test_optional_mineru_fixtures.py
|
|
Remove-Item Env:PDF2MD_RUN_MINERU_FIXTURES
|
|
```
|
|
|
|
This optional pytest path runs `pdf2md doctor` first. If doctor has a hard failure, the fixture test is skipped with a blocker message instead of being counted as a passing real-MinerU validation.
|
|
|
|
A sample conversion is successful only when all of these are true:
|
|
|
|
- The command exits 0.
|
|
- The planned Markdown part exists: `<output>\<stem>\<stem>_001.md`.
|
|
- The planned quality report exists: `<output>\<stem>\<stem>_report.md`.
|
|
- No public `.metadata.json` sidecar is written for new conversions.
|
|
- The report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
|
|
- Any Markdown image links resolve relative to the Markdown file, or missing/broken links are reported as warnings.
|
|
|
|
Missing Markdown part or `_report.md` means the sample failed or is blocked. Do not count it as a partial success for release gating.
|
|
|
|
For each attempted sample, record at least:
|
|
|
|
- Source filename.
|
|
- Command run.
|
|
- Exit code.
|
|
- Generated Markdown path.
|
|
- Generated `_report.md` path.
|
|
- Warning count and final status.
|
|
- Math renderability failures or checker-unavailable count.
|
|
- Table fallback or degradation count when available.
|
|
- Missing or broken asset link count.
|
|
- Page coverage when available.
|
|
- Doctor status and any GPU/PyTorch/model/cache warnings.
|
|
|
|
## Fixture Coverage Notes
|
|
|
|
Local fixture coverage should include these risk categories where samples are available:
|
|
|
|
- Simple digital PDF with a text layer.
|
|
- Math-heavy PDF.
|
|
- Multi-column or complex reading order.
|
|
- Table with formulas.
|
|
- Figure, caption, or extracted asset links.
|
|
- Korean or non-ASCII filename/path handling.
|
|
|
|
Observed local fixture map as of 2026-05-11:
|
|
|
|
| Local sample | Fixture risks covered | Notes |
|
|
| --- | --- | --- |
|
|
| `samples/FourNodeQuadrilateralShellElementMITC4.pdf` | simple digital PDF, math-heavy engineering content, table/formula risk | Small sample suitable for first optional MinerU smoke validation. |
|
|
| `samples/MITC공부.pdf` | Korean filename/path handling, math-heavy notes, table/formula risk | Useful for non-ASCII path and shell-element notation checks. |
|
|
| `samples/2007쉘구조물의유한요소해석에대하여.pdf` | Korean filename/path handling, longer academic layout, math-heavy content, reading-order risk | Larger sample; use after the small smoke sample passes. |
|
|
| `samples/유한요소해석법을이용한쉘구조물의동적좌굴해석.pdf` | Korean filename/path handling, math-heavy content, figures/assets, reading-order risk | Larger sample for report and warning quality review. |
|
|
| `samples/metadata.json` | fixture notes only | Local untracked context; do not treat as generated v1 metadata unless manually confirmed. |
|
|
|
|
Coverage gaps to keep visible:
|
|
|
|
- A deliberately simple one-page digital PDF fixture is still useful for release smoke checks.
|
|
- A table-dominant sample with known formula cells would make table degradation easier to judge.
|
|
- A figure-heavy sample with expected extracted assets would make asset link validation easier to judge.
|
|
|
|
Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, report provenance, warning usefulness, and `_report.md` usefulness.
|
|
|
|
## No-Sample-Commit Check
|
|
|
|
Before staging or committing any release-gate work:
|
|
|
|
```powershell
|
|
git status --short --untracked-files=all
|
|
```
|
|
|
|
Confirm:
|
|
|
|
- `samples/` files are not staged.
|
|
- Generated sample outputs are not staged.
|
|
- No sample PDF or sample-derived binary was copied into tracked tests or docs.
|
|
- Any temporary fixture output inside the repository was removed unless the user explicitly approved keeping it and it is ignored.
|
|
|
|
`samples/` may appear as untracked local context. That is expected; do not add it unless the user explicitly requests it.
|