14 KiB
Sprint 9 Contract: Local Fixture Evaluation And V1 Release Gate
Status: Implemented Last updated: 2026-05-08
Objective
Validate the v1 converter against local fixture workflows without committing sample PDFs or making the default test loop depend on MinerU models, GPU, CUDA, network access, Obsidian, or LaTeX tooling.
Sprint 9 must establish:
- A fast mocked integration suite that exercises the public conversion path end to end.
- An optional, explicitly enabled local MinerU fixture evaluation path for
samples/. - A fixture coverage manifest or checklist that records which local PDFs cover math, tables, figures/assets, reading order, Korean filenames, and metadata/report risks.
- Release-gate documentation that distinguishes default automated checks from optional local MinerU/GPU checks.
- Clear
PROGRESS.mdnotes for local fixture coverage, skipped/blocked optional checks, known quality risks, and the v1 go/no-go recommendation.
Sprint 9 is an evaluation and release-gate sprint. It may add tests, local-only evaluation helpers, fixture manifests, and narrow compatibility fixes only when needed to evaluate the current v1 behavior. It must not add alternate engines, cloud/API paths, runtime engine selection, or automatic model downloads.
Current Precondition
Sprint 8 is complete:
pdf2md doctorexists and reports Python,uv, MinerU CLI/version, GPU, PyTorch, model/cache, and strict-local policy status.- Local
pdf2md doctorcurrently fails because themineruCLI is not installed on PATH. pdf2md convertexists and writes Markdown, metadata JSON, and<stem>.report.mdwith fake-adapter test coverage.- Default tests pass without real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/. samples/exists locally and is untracked. Observed local fixture files include:samples/FourNodeQuadrilateralShellElementMITC4.pdfsamples/MITC공부.pdfsamples/2007쉘구조물의유한요소해석에대하여.pdfsamples/유한요소해석법을이용한쉘구조물의동적좌굴해석.pdfsamples/metadata.json
Sprint 9 must preserve the untracked status of samples/ unless the user explicitly requests otherwise.
Touched Surfaces
Allowed:
tests/integration/tests/test_conversion.pytests/test_cli.pytests/test_report.pytests/test_metadata.pytests/test_quality.pytests/conftest.pyonly for markers or opt-in fixture controlssrc/pdf2md/mineru_adapter.pyonly for narrow compatibility fixes backed by mocked or optional local MinerU output evidencesrc/pdf2md/conversion.pyonly for narrow release-gate defects found by integration testssrc/pdf2md/quality.pyonly for local quality metric defects found by integration testssrc/pdf2md/report.pyonly for report defects found by integration testsREADME.mddocs/V1RELEASECHECKLIST.mddocs/V1IMPLEMENTATIONPLAN.mddocs/Sprints/SPRINT9CONTRACT.mdPLAN.mdPROGRESS.md
Not allowed:
- Committed files under
samples/ - Committed generated conversion outputs from local sample PDFs
- Mandatory tests that require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or
samples/ - Automatic package installs or model downloads from tests, import time, doctor, convert, or helpers
- Runtime engine selection or alternate conversion engines
- Cloud OCR, remote LLM/VLM, hosted renderer, remote document parser, remote asset fetching,
--api-url, router mode, HTTP client backends, remote APIs, or remote OpenAI-compatible backends - CLI/API options that disable strict-local policy
- Claims that v1 perfectly reconstructs LaTeX, tables, or reading order
Expected Outputs
-
Fast mocked integration suite
- Exercises
convert_pdfand/orpdf2md convertwith a fake MinerU adapter through the real orchestration path. - Verifies Markdown, metadata JSON, and
<stem>.report.mdare all written. - Verifies output paths, asset links, warning counts, and report status stay consistent.
- Verifies failures produce metadata/report warnings when possible and do not silently fallback.
- Runs as part of
uv run pytestwithout real MinerU, models, GPU, network, Obsidian, LaTeX tooling, orsamples/.
- Exercises
-
Optional local MinerU fixture evaluation
- Provides an explicit opt-in command or pytest marker/environment gate for real local MinerU sample evaluation.
- Skips or reports a clear local blocker when
pdf2md doctorfails because MinerU, model/cache paths, or GPU/PyTorch acceleration are unavailable. - Reads sample PDFs only from
samples/or a user-provided local sample directory. - Writes generated outputs to a temporary or ignored output directory, never to tracked fixture paths.
- Produces or records, for each attempted sample:
- source filename
- command run
- exit code
- generated Markdown path
- generated metadata JSON path
- generated
.report.mdpath - warning count
- math renderability or checker-unavailable count
- table fallback/degradation count when available
- missing or broken asset link count
- page coverage when available
- Does not mark optional evaluation as passed when MinerU is missing; it records the blocker.
-
Fixture coverage manifest or checklist
- Maps local sample files to risk categories:
- simple digital PDF
- math-heavy PDF
- multi-column or complex reading order
- table with formulas
- figure/caption/assets
- Korean filename/path handling
- May store only relative sample names, categories, and notes; it must not embed sample PDFs or generated outputs.
- Records coverage gaps that need additional user-provided samples.
- Maps local sample files to risk categories:
-
V1 release checklist
- Defines default release gates:
uv syncuv run pytestuv run pdf2md --versionuv run pdf2md doctorgit diff --checkgit status --short --untracked-files=all
- Defines optional local MinerU release gates separately from default gates.
- Requires Markdown, metadata JSON, and
.report.mdto exist before any sample conversion is considered successful. - Requires warnings and residual risks to be recorded in
PROGRESS.md. - Makes local-only and no-sample-commit checks explicit.
- Defines default release gates:
-
Documentation
- README or release checklist explains how to run default checks and optional local fixture checks.
- Documentation states that optional fixture checks may be skipped or blocked until MinerU 3.1.0 and model/cache setup are available.
- Documentation does not instruct users to use
--api-url, router mode, HTTP client backends, remote APIs, or remote OpenAI-compatible backends.
-
Handoff
PROGRESS.mdrecords changed files, commands run, tests passed or blocked, local fixture status, generated output location if any, known failures, residual risks, and next action.
Non-Goals
- Do not install MinerU.
- Do not download MinerU models.
- Do not run model setup automatically.
- Do not require the local GTX 1070 Ti to pass CUDA/PyTorch checks in the default test loop.
- Do not improve OCR/model accuracy.
- Do not introduce a manual review UI, hosted web UI, or local desktop launcher in Sprint 9.
- Do not add alternate conversion engines or fallback engines.
- Do not benchmark against cloud OCR/API services.
- Do not commit sample PDFs, sample-derived outputs, or large binary fixtures.
- Do not make text edit distance the only quality criterion.
- Do not claim v1 is release-ready if metadata JSON or
.report.mdgeneration is missing.
Work Packages
WP9.1: Fast Mocked Integration Checks
Owner:
feature-generator-agentevaluation-agent
Actions:
- Add integration-level tests that use fake adapter output but run the public conversion orchestration and CLI paths.
- Assert generated Markdown, metadata JSON,
.report.md, assets, warnings, and summaries are mutually consistent. - Keep tests deterministic and independent of real samples.
Output:
uv run pytestcovers v1 file-output behavior without model or GPU dependencies.
WP9.2: Optional MinerU Sample Evaluation Harness
Owner:
mineru-integration-agentlocal-setup-agentevaluation-agent
Actions:
- Add an explicit opt-in local fixture command/test path.
- Gate real MinerU execution behind an environment variable, marker, or explicit command documented in README/checklist.
- Run
pdf2md doctoror equivalent preflight before optional local MinerU evaluation. - Use temporary or ignored output directories.
- Record blocked status clearly when MinerU/model/cache setup is missing.
Output:
- Local users can run real sample evaluation when setup is ready, while default tests stay fast and local.
WP9.3: Fixture Coverage And Metrics
Owner:
evaluation-agentobsidian-markdown-agentmetadata-agent
Actions:
- Define fixture categories and expected risk coverage.
- Track math delimiter/renderability, tables, reading order, assets, page coverage, metadata fields, warning counts, and report usefulness.
- Avoid scoring quality only by plain-text edit distance.
Output:
- Fixture coverage is explicit and gaps are visible.
WP9.4: V1 Release Gate Documentation
Owner:
requirements-guard-agentevaluation-agent
Actions:
- Add or update release checklist documentation.
- Separate default release gates from optional local MinerU/GPU gates.
- Keep strict-local wording consistent with
ARCHITECTURE.md,PRD.md, andREADME.md. - Update
PLAN.mdandPROGRESS.mdwith the next action and release readiness state.
Output:
- A future agent can determine whether v1 is blocked, partial, or ready without relying on conversation history.
WP9.5: Independent Evaluation
Owner:
evaluation-agent
Actions:
- Review completed Sprint 9 work against this contract.
- Verify default tests do not require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or
samples/. - Verify optional local MinerU evaluation is clearly gated.
- Verify generated sample outputs and sample PDFs are not staged.
- Verify release checklist cannot pass without Markdown, metadata JSON, and
.report.md.
Output:
- PASS/FAIL notes with actionable findings and residual risk.
Verification Checks
Required:
git status --short --untracked-files=allbefore staging confirmssamples/remains untracked and unstaged.uv --versionis run and result is recorded.uv syncpasses.uv run pytestpasses.- Targeted integration tests pass.
uv run pdf2md --versionpasses.uv run pdf2md doctoris run and its result is recorded as pass, warn, or blocked/fail.git diff --checkpasses.- Default tests do not require real MinerU, CUDA, GPU, PyTorch, model files, network, Obsidian, LaTeX tooling, or
samples/. - No model downloads occur.
- No setup downloads occur from tests, import time, doctor, convert, or helper scripts.
- No network calls are required in default tests.
- No candidate engine comparison is reintroduced.
- No alternate engine or runtime engine selection is added.
- No CLI/API option disables strict-local policy.
- No
--api-url, router mode, HTTP client backend, remote API, or remote OpenAI-compatible backend support is added. - Optional local MinerU checks are skipped or blocked clearly when setup is unavailable.
- Sample PDFs and generated sample outputs are not staged or committed.
PROGRESS.mdrecords local fixture coverage status and release readiness.
Recommended:
- Add a pytest marker or environment variable for optional local MinerU tests.
- Keep optional output under a temporary directory or an ignored local output root.
- Include at least one Korean filename/path check in fast mocked tests.
- Include one fake output with math, one with a table warning, and one with an asset link.
- Record source-to-output paths in release checklist examples.
- Treat local doctor failure as a release blocker for real MinerU validation but not for the default fast test loop.
Hard Failure Criteria
Sprint 9 fails and must stop for a user decision if any of these are true:
- Default tests require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or
samples/. - Sample PDFs or generated sample outputs are staged or committed.
- Optional real MinerU evaluation runs without an explicit opt-in gate.
- Optional real MinerU evaluation writes generated output into tracked fixture paths.
- V1 release checklist can pass without generated Markdown, metadata JSON, and
.report.md. - Release status is marked ready when
pdf2md doctorhas a hard failure and no explicit user waiver is recorded. - The implementation adds runtime engine selection or alternate engines.
- The implementation adds or permits
--api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends. - The implementation uses cloud/API fallback for any fixture evaluation.
- The implementation hides MinerU failure or silently falls back to another engine.
- Quality criteria ignore math, tables, reading order, assets, metadata, or report quality.
Acceptance Criteria
Sprint 9 is complete when:
docs/Sprints/SPRINT9CONTRACT.mdexists and is referenced by relevant agents.- Fast mocked integration tests exist and pass under
uv run pytest. - Optional local MinerU fixture evaluation is documented and explicitly gated.
- Local fixture coverage categories and gaps are recorded.
- Release checklist documentation exists or is updated.
PROGRESS.mdrecords optional local MinerU status, including skipped/blocked reasons when applicable.- Default tests do not require real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, LaTeX tooling, or
samples/. - No sample PDF or generated sample output is staged or committed.
uv syncpasses.uv run pytestpasses.git diff --checkpasses.- Independent evaluation is complete.
- The completed change is committed.
Handoff Fields
Use these fields when Sprint 9 completes:
- Files changed:
- Commands run:
- Tests passed:
- Tests blocked:
- Optional local MinerU status:
- Fixture coverage:
- Generated output locations:
- Known failures:
- Residual risks:
- User decisions needed:
- V1 release recommendation:
- Go/no-go recommendation for next sprint:
- Next action: