# Sprint 0 Contract: Source And Environment Verification Status: Completed Last updated: 2026-05-07 Result: PASS, go-with-risks ## Objective Verify the external facts and local environment assumptions needed before any converter implementation starts. Current amendment: the project target changed after the original Sprint 0 pass from MinerU 2.5 to MinerU 3.1.0. Read MinerU version constraints in this contract as applying to MinerU 3.1.0 for future work. Sprint 0 must answer whether the planned v1 implementation can proceed with: - MinerU 3.1.0 through direct local CLI execution only. - Python 3.12 and `uv`. - Windows PowerShell on the current workspace. - NVIDIA GTX 1070 Ti 8GB as the target GPU. - Local-only processing with no cloud OCR, remote LLM/VLM, hosted parser, `--api-url`, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends. - The temporary local `mineru-api` process started internally by MinerU 3.1.0 CLI is allowed when CLI runs without `--api-url`. ## Touched Surfaces Allowed: - `docs/KNOWLEDGEBASE.md` - `docs/V1IMPLEMENTATIONPLAN.md` only if the sprint sequence or constraints need adjustment - `docs/Sprints/SPRINT0CONTRACT.md` - `PROGRESS.md` Not allowed: - `pyproject.toml` - `src/` - `tests/` - `scripts/` - Any converter implementation code - Any committed file under `samples/` ## Expected Outputs Sprint 0 should produce a concise source-backed handoff in `docs/KNOWLEDGEBASE.md` under the heading `## 9. Sprint 0 Verification (2026-05-07)`, plus a short status and handoff entry in `PROGRESS.md`. Evidence requirements: - Prefer official or primary sources. - Use non-official sources only when official documentation is incomplete and the source is directly relevant. - For volatile implementation claims, record source URL, access date, and whether the claim is a direct fact or a project inference. - Record failures from allowed local commands as environment facts. Do not fix them during Sprint 0. - Web research is allowed for documentation verification. Runtime converter design must remain local-only. The handoff must cover: 1. MinerU 3.1.0 local CLI facts - Install command or supported install path. - Version command or reliable version detection path. - Direct local CLI invocation shape for PDF conversion. - Supported output locations for Markdown, JSON/structured data, assets, logs, and raw diagnostics. - Whether local execution can be kept free of router/API/HTTP endpoint modes. 2. Python and package workflow facts - Python 3.12 compatibility status for the planned dependency stack. - `uv` setup implications. - Any packaging constraints that affect Sprint 1 scaffolding. 3. GPU and runtime facts - CUDA/PyTorch expectations relevant to GTX 1070 Ti 8GB. - Known GPU memory risks or CPU fallback/error-message implications. - Commands that future `pdf2md doctor` should check. 4. License and privacy facts - MinerU license status. - Model-weight license status when identifiable. - Transitive package/license risks that must be reviewed before redistribution. - Confirmation that v1 runtime design remains local-only. 5. Implementation go/no-go recommendation - `go`: proceed to Sprint 1 with listed assumptions. - `go-with-risks`: proceed but carry specific risks into later sprint contracts. Each risk must name the later sprint that must absorb it. - `blocked`: stop and ask the user for a requirement change. ## Non-Goals - Do not scaffold the Python project. - Do not install MinerU, CUDA, PyTorch, or model weights. - Do not run full PDF conversion. - Do not edit `samples/` or commit sample files. - Do not introduce candidate engine comparisons. - Do not decide a new conversion engine. - Do not add implementation abstractions, config systems, or CLI flags. ## Work Packages ### WP0.1: MinerU Source Review Owner: - `research-agent` - `mineru-research` skill Actions: - Review official MinerU documentation, MinerU GitHub, release notes/tags, and relevant license files. - Verify current install, CLI, version, output, model/cache, and local execution facts. - Record source URLs and access date for durable claims. Output: - Source-backed MinerU fact table in `docs/KNOWLEDGEBASE.md` under `## 9. Sprint 0 Verification (2026-05-07)`. - Any adapter-impacting uncertainty listed in `PROGRESS.md`. ### WP0.2: Local Environment Review Owner: - `local-setup-agent` Actions: - Check non-invasive local facts, such as Python version, `uv` availability, and GPU visibility when available. - Review official Python, `uv`, PyTorch/CUDA, and NVIDIA-relevant documentation for compatibility constraints. - Identify future `pdf2md doctor` checks. Output: - Environment compatibility notes and doctor-check requirements. - Explicit GTX 1070 Ti 8GB risks. Allowed local commands: ```powershell python --version uv --version nvidia-smi ``` Only run these commands if they are useful for the active research pass. Do not install or modify system packages. ### WP0.3: Output Layout Probe Plan Owner: - `mineru-integration-agent` Actions: - Define what must be observed from MinerU before implementing the adapter. - If MinerU is already installed locally, a later user-approved probe may inspect command help or run against a disposable file. Sprint 0 does not require conversion execution. Output: - Adapter-facing output layout assumptions. - List of fields that must remain optional until observed from real MinerU output. ### WP0.4: License And Privacy Review Owner: - `license-privacy-agent` Actions: - Review MinerU and model/package license sources. - Distinguish personal/research use from redistribution. - Check that no planned runtime path uploads PDFs, page images, extracted text, or model intermediates. Output: - License/privacy summary with unresolved obligations. - Blockers if redistribution assumptions are unsafe. ### WP0.5: Contract Evaluation Owner: - `evaluation-agent` Actions: - Review this contract before Sprint 0 research starts. - Require concrete evidence expectations and failure thresholds. - After Sprint 0 research, independently verify that each expected output is present and source-backed. Output: - Pass/fail evaluation notes. - Specific follow-up findings if the contract or results are incomplete. ## Verification Checks Required: - Every volatile implementation fact has an official or primary source URL. - Source-backed claims distinguish direct fact from project inference. - `docs/KNOWLEDGEBASE.md` remains consistent with `PRD.md` and `ARCHITECTURE.md`. - No candidate engine comparison is reintroduced. - No converter implementation code is created. - `samples/` remains untracked. - `git diff --check` passes for documentation changes. Recommended: - Check all newly added links manually or with a lightweight local link check if available. - Have `evaluation-agent` review the completed Sprint 0 outputs before proceeding to Sprint 1. ## Hard Failure Criteria Sprint 0 fails and must stop for a user decision if any of these are true: - MinerU 3.1.0 cannot be invoked through a direct local CLI path suitable for v1. - MinerU's required v1 path requires `--api-url`, router mode, HTTP client backends, remote APIs, or remote OpenAI-compatible backend behavior. - Python 3.12 is incompatible with the required implementation stack. - Local-only processing cannot be maintained. - License terms clearly block the intended personal/research use. - Required evidence cannot be verified from official or primary sources. ## Acceptance Criteria Sprint 0 is complete when: - `docs/KNOWLEDGEBASE.md` contains updated Sprint 0 facts with sources. - `PROGRESS.md` records checks performed, unresolved risks, and go/no-go recommendation. - Future Sprint 1 can proceed without guessing install, CLI, environment, or licensing assumptions. - The evaluator review is complete. - The completed documentation change is committed. ## Handoff Fields Use these fields when Sprint 0 completes: - Files changed: - Sources checked: - Local commands run: - Facts confirmed: - Inferences made: - Known failures: - Residual risks: - Go/no-go recommendation: - Next action: