baram2584/PDFToMD

Fork 0

Files

T

김경종 88d6b92283 add pdftomd

2026-05-08 16:42:19 +09:00

8.0 KiB

Raw Blame History

Sprint 0 Contract: Source And Environment Verification

Status: Completed Last updated: 2026-05-07 Result: PASS, go-with-risks

Objective

Verify the external facts and local environment assumptions needed before any converter implementation starts.

Current amendment: the project target changed after the original Sprint 0 pass from MinerU 2.5 to MinerU 3.1.0. Read MinerU version constraints in this contract as applying to MinerU 3.1.0 for future work.

Sprint 0 must answer whether the planned v1 implementation can proceed with:

MinerU 3.1.0 through direct local CLI execution only.
Python 3.12 and uv.
Windows PowerShell on the current workspace.
NVIDIA GTX 1070 Ti 8GB as the target GPU.
Local-only processing with no cloud OCR, remote LLM/VLM, hosted parser, --api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
The temporary local mineru-api process started internally by MinerU 3.1.0 CLI is allowed when CLI runs without --api-url.

Touched Surfaces

Allowed:

docs/KNOWLEDGEBASE.md
docs/V1IMPLEMENTATIONPLAN.md only if the sprint sequence or constraints need adjustment
docs/Sprints/SPRINT0CONTRACT.md
PROGRESS.md

Not allowed:

pyproject.toml
src/
tests/
scripts/
Any converter implementation code
Any committed file under samples/

Expected Outputs

Sprint 0 should produce a concise source-backed handoff in docs/KNOWLEDGEBASE.md under the heading ## 9. Sprint 0 Verification (2026-05-07), plus a short status and handoff entry in PROGRESS.md.

Evidence requirements:

Prefer official or primary sources.
Use non-official sources only when official documentation is incomplete and the source is directly relevant.
For volatile implementation claims, record source URL, access date, and whether the claim is a direct fact or a project inference.
Record failures from allowed local commands as environment facts. Do not fix them during Sprint 0.
Web research is allowed for documentation verification. Runtime converter design must remain local-only.

The handoff must cover:

MinerU 3.1.0 local CLI facts
- Install command or supported install path.
- Version command or reliable version detection path.
- Direct local CLI invocation shape for PDF conversion.
- Supported output locations for Markdown, JSON/structured data, assets, logs, and raw diagnostics.
- Whether local execution can be kept free of router/API/HTTP endpoint modes.
Python and package workflow facts
- Python 3.12 compatibility status for the planned dependency stack.
- uv setup implications.
- Any packaging constraints that affect Sprint 1 scaffolding.
GPU and runtime facts
- CUDA/PyTorch expectations relevant to GTX 1070 Ti 8GB.
- Known GPU memory risks or CPU fallback/error-message implications.
- Commands that future pdf2md doctor should check.
License and privacy facts
- MinerU license status.
- Model-weight license status when identifiable.
- Transitive package/license risks that must be reviewed before redistribution.
- Confirmation that v1 runtime design remains local-only.
Implementation go/no-go recommendation
- go: proceed to Sprint 1 with listed assumptions.
- go-with-risks: proceed but carry specific risks into later sprint contracts. Each risk must name the later sprint that must absorb it.
- blocked: stop and ask the user for a requirement change.

Non-Goals

Do not scaffold the Python project.
Do not install MinerU, CUDA, PyTorch, or model weights.
Do not run full PDF conversion.
Do not edit samples/ or commit sample files.
Do not introduce candidate engine comparisons.
Do not decide a new conversion engine.
Do not add implementation abstractions, config systems, or CLI flags.

Work Packages

WP0.1: MinerU Source Review

Owner:

research-agent
mineru-research skill

Actions:

Review official MinerU documentation, MinerU GitHub, release notes/tags, and relevant license files.
Verify current install, CLI, version, output, model/cache, and local execution facts.
Record source URLs and access date for durable claims.

Output:

Source-backed MinerU fact table in docs/KNOWLEDGEBASE.md under ## 9. Sprint 0 Verification (2026-05-07).
Any adapter-impacting uncertainty listed in PROGRESS.md.

WP0.2: Local Environment Review

Owner:

local-setup-agent

Actions:

Check non-invasive local facts, such as Python version, uv availability, and GPU visibility when available.
Review official Python, uv, PyTorch/CUDA, and NVIDIA-relevant documentation for compatibility constraints.
Identify future pdf2md doctor checks.

Output:

Environment compatibility notes and doctor-check requirements.
Explicit GTX 1070 Ti 8GB risks.

Allowed local commands:

python --version
uv --version
nvidia-smi

Only run these commands if they are useful for the active research pass. Do not install or modify system packages.

WP0.3: Output Layout Probe Plan

Owner:

mineru-integration-agent

Actions:

Define what must be observed from MinerU before implementing the adapter.
If MinerU is already installed locally, a later user-approved probe may inspect command help or run against a disposable file. Sprint 0 does not require conversion execution.

Output:

Adapter-facing output layout assumptions.
List of fields that must remain optional until observed from real MinerU output.

WP0.4: License And Privacy Review

Owner:

license-privacy-agent

Actions:

Review MinerU and model/package license sources.
Distinguish personal/research use from redistribution.
Check that no planned runtime path uploads PDFs, page images, extracted text, or model intermediates.

Output:

License/privacy summary with unresolved obligations.
Blockers if redistribution assumptions are unsafe.

WP0.5: Contract Evaluation

Owner:

evaluation-agent

Actions:

Review this contract before Sprint 0 research starts.
Require concrete evidence expectations and failure thresholds.
After Sprint 0 research, independently verify that each expected output is present and source-backed.

Output:

Pass/fail evaluation notes.
Specific follow-up findings if the contract or results are incomplete.

Verification Checks

Required:

Every volatile implementation fact has an official or primary source URL.
Source-backed claims distinguish direct fact from project inference.
docs/KNOWLEDGEBASE.md remains consistent with PRD.md and ARCHITECTURE.md.
No candidate engine comparison is reintroduced.
No converter implementation code is created.
samples/ remains untracked.
git diff --check passes for documentation changes.

Recommended:

Check all newly added links manually or with a lightweight local link check if available.
Have evaluation-agent review the completed Sprint 0 outputs before proceeding to Sprint 1.

Hard Failure Criteria

Sprint 0 fails and must stop for a user decision if any of these are true:

MinerU 3.1.0 cannot be invoked through a direct local CLI path suitable for v1.
MinerU's required v1 path requires --api-url, router mode, HTTP client backends, remote APIs, or remote OpenAI-compatible backend behavior.
Python 3.12 is incompatible with the required implementation stack.
Local-only processing cannot be maintained.
License terms clearly block the intended personal/research use.
Required evidence cannot be verified from official or primary sources.

Acceptance Criteria

Sprint 0 is complete when:

docs/KNOWLEDGEBASE.md contains updated Sprint 0 facts with sources.
PROGRESS.md records checks performed, unresolved risks, and go/no-go recommendation.
Future Sprint 1 can proceed without guessing install, CLI, environment, or licensing assumptions.
The evaluator review is complete.
The completed documentation change is committed.

Handoff Fields

Use these fields when Sprint 0 completes:

Files changed:
Sources checked:
Local commands run:
Facts confirmed:
Inferences made:
Known failures:
Residual risks:
Go/no-go recommendation:
Next action:

8.0 KiB Raw Blame History

Sprint 0 Contract: Source And Environment Verification

Objective

Touched Surfaces

Expected Outputs

Non-Goals

Work Packages

WP0.1: MinerU Source Review

WP0.2: Local Environment Review

WP0.3: Output Layout Probe Plan

WP0.4: License And Privacy Review

WP0.5: Contract Evaluation

Verification Checks

Hard Failure Criteria

Acceptance Criteria

Handoff Fields

8.0 KiB

Raw Blame History