Files
PDFToMD/docs/V1RELEASECHECKLIST.md
2026-05-14 10:16:59 +09:00

8.6 KiB

V1 Release Checklist

Use this checklist from the repository root when deciding whether v1 is ready for local use. It separates the default fast gates from optional MinerU/GPU/sample validation so the normal loop stays independent of real models, GPU, network access, Obsidian, LaTeX tooling, and samples/.

Release Status Rules

  • Default fast gates passing means the repository is healthy under mocked/local checks.
  • Optional local MinerU fixture gates passing means real local sample conversion has also been validated.
  • If pdf2md doctor reports a hard failure, v1 release status is blocked unless the user records an explicit waiver with the exact reason and scope.
  • Do not mark v1 fully validated when optional MinerU/sample checks are skipped or blocked. Record the blocked reason and release recommendation in PROGRESS.md.
  • Do not claim perfect LaTeX, table, or reading-order reconstruction. The v1 guarantee is best-effort local conversion with warnings, metadata, and a human-readable report.

Default Fast Gates

These gates should be runnable without real MinerU execution, sample PDFs, model files, GPU acceleration, network access, Obsidian, or LaTeX tooling. They must not install packages or download models during imports, tests, doctor, or convert.

Gate Pass condition Release handling
uv sync Exits 0. Blocks if project dependencies cannot sync.
uv run pytest Exits 0 using mocked/local tests only. Blocks if default tests require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or samples/.
uv run pdf2md --version Exits 0 and prints the installed CLI version. Blocks if the CLI entry point is unavailable.
uv run pdf2md doctor Runs to completion and reports pass, warning-only, or hard-fail status. A hard failure blocks release readiness unless explicitly waived by the user and recorded. Warning-only status can continue, but warnings must be recorded.
git diff --check Exits 0. Blocks on whitespace or patch formatting errors.
git status --short --untracked-files=all Shows no staged sample PDFs or generated sample outputs. Blocks if any samples/ file or sample-derived output is staged or committed unintentionally.

Strict-Local Gate

Before calling v1 ready, verify the release candidate still follows the strict-local policy:

  • Allowed runtime path: direct local mineru CLI execution.
  • Allowed MinerU internal behavior: MinerU 3.1.0 may start a temporary local mineru-api when the CLI runs without --api-url.
  • Prohibited runtime paths: --api-url, remote APIs, router mode, HTTP client backends, remote OpenAI-compatible backends, hosted renderers, cloud OCR, remote LLM/VLM calls, remote document parsers, alternate engines, runtime engine selection, and silent fallback after MinerU failure.
  • Setup downloads must be explicit user actions and remain separate from runtime conversion. Default tests, imports, doctor, convert, and helper checks must not download packages or models.

Doctor Hard-Failure Handling

Treat a non-zero pdf2md doctor result as a release blocker for real v1 readiness. Common hard failures include missing required local dependencies, missing mineru on PATH, or a strict-local policy failure. Warning-only doctor results can continue, but the warnings must be recorded.

When doctor hard-fails:

  • Do not run optional sample conversion as a passing release gate.
  • Do not mark optional MinerU/GPU/sample validation as skipped-pass. Mark it blocked.
  • Do not use a cloud/API fallback or alternate converter to bypass the failure.
  • Record the failing check, exit code, and next action in PROGRESS.md.
  • If the user chooses to proceed anyway, record the waiver and report the release as waived or risk-accepted, not fully validated.

Optional Local MinerU, GPU, And Sample Gates

Run these only by explicit opt-in after the default gates. They are intended for a local workstation with MinerU 3.1.0, model/cache setup, and GPU/PyTorch expectations already checked by doctor.

Preconditions:

  • uv run pdf2md doctor has no hard failures, or the user has recorded an explicit waiver.
  • Source PDFs come from local samples/ or another user-provided local directory.
  • Generated outputs go to a temporary directory or another ignored local output root, never to tracked fixture paths.
  • Sample PDFs, samples/metadata.json, and generated sample outputs remain unstaged and uncommitted.

Manual single-sample shape:

$sample = "samples\FourNodeQuadrilateralShellElementMITC4.pdf"
$out = Join-Path $env:TEMP ("pdf2md-fixture-" + [guid]::NewGuid().ToString())
uv run pdf2md convert $sample --out $out --metadata --overwrite

Optional pytest fixture evaluation:

$env:PDF2MD_RUN_MINERU_FIXTURES = "1"
uv run pytest tests/integration/test_optional_mineru_fixtures.py
Remove-Item Env:PDF2MD_RUN_MINERU_FIXTURES

This optional pytest path runs pdf2md doctor first. If doctor has a hard failure, the fixture test is skipped with a blocker message instead of being counted as a passing real-MinerU validation.

A sample conversion is successful only when all of these are true:

  • The command exits 0.
  • The planned Markdown part exists: <output>\<stem>\<stem>_001.md.
  • The planned quality report exists: <output>\<stem>\<stem>_report.md.
  • No public .metadata.json sidecar is written for new conversions.
  • The report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
  • Any Markdown image links resolve relative to the Markdown file, or missing/broken links are reported as warnings.

Missing Markdown part or _report.md means the sample failed or is blocked. Do not count it as a partial success for release gating.

For each attempted sample, record at least:

  • Source filename.
  • Command run.
  • Exit code.
  • Generated Markdown path.
  • Generated _report.md path.
  • Warning count and final status.
  • Math renderability failures or checker-unavailable count.
  • Table fallback or degradation count when available.
  • Missing or broken asset link count.
  • Page coverage when available.
  • Doctor status and any GPU/PyTorch/model/cache warnings.

Fixture Coverage Notes

Local fixture coverage should include these risk categories where samples are available:

  • Simple digital PDF with a text layer.
  • Math-heavy PDF.
  • Multi-column or complex reading order.
  • Table with formulas.
  • Figure, caption, or extracted asset links.
  • Korean or non-ASCII filename/path handling.

Observed local fixture map as of 2026-05-11:

Local sample Fixture risks covered Notes
samples/FourNodeQuadrilateralShellElementMITC4.pdf simple digital PDF, math-heavy engineering content, table/formula risk Small sample suitable for first optional MinerU smoke validation.
samples/MITC공부.pdf Korean filename/path handling, math-heavy notes, table/formula risk Useful for non-ASCII path and shell-element notation checks.
samples/2007쉘구조물의유한요소해석에대하여.pdf Korean filename/path handling, longer academic layout, math-heavy content, reading-order risk Larger sample; use after the small smoke sample passes.
samples/유한요소해석법을이용한쉘구조물의동적좌굴해석.pdf Korean filename/path handling, math-heavy content, figures/assets, reading-order risk Larger sample for report and warning quality review.
samples/metadata.json fixture notes only Local untracked context; do not treat as generated v1 metadata unless manually confirmed.

Coverage gaps to keep visible:

  • A deliberately simple one-page digital PDF fixture is still useful for release smoke checks.
  • A table-dominant sample with known formula cells would make table degradation easier to judge.
  • A figure-heavy sample with expected extracted assets would make asset link validation easier to judge.

Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, report provenance, warning usefulness, and _report.md usefulness.

No-Sample-Commit Check

Before staging or committing any release-gate work:

git status --short --untracked-files=all

Confirm:

  • samples/ files are not staged.
  • Generated sample outputs are not staged.
  • No sample PDF or sample-derived binary was copied into tracked tests or docs.
  • Any temporary fixture output inside the repository was removed unless the user explicitly approved keeping it and it is ignored.

samples/ may appear as untracked local context. That is expected; do not add it unless the user explicitly requests it.