Files
2026-05-08 16:42:19 +09:00

8.0 KiB

Sprint 1 Contract: Project Scaffold And Fast Test Loop

Status: Completed Last updated: 2026-05-07

Objective

Create the minimal Python project scaffold and fast local test loop for the PDF-to-Markdown converter.

Sprint 1 must establish:

  • A uv-managed Python 3.12 project.
  • A source package importable as pdf2md.
  • A reserved pdf2md CLI entry point that does not implement conversion yet.
  • A fast test command that runs without MinerU, model downloads, GPU access, sample PDFs, or network access.

Sprint 1 is scaffolding only. It must not implement PDF conversion, MinerU execution, Markdown normalization, metadata generation, or report generation.

Current Precondition

Sprint 0 found that uv was not available on PATH in the current local environment.

Sprint 1 resolved this by installing uv per-user at C:\Users\user\.local\bin.

Before Sprint 1 can be accepted, one of these must happen:

  • uv is installed and uv --version succeeds.
  • The user explicitly approves including uv bootstrap documentation or setup handling as part of Sprint 1, and the contract result records that uv sync could not be run locally.

Do not silently replace uv with another package manager.

Touched Surfaces

Allowed:

  • pyproject.toml
  • uv.lock
  • .gitignore
  • src/pdf2md/__init__.py
  • src/pdf2md/cli.py only for a minimal placeholder CLI if needed for entry point verification
  • tests/
  • README.md only for minimal setup/test instructions if needed
  • PLAN.md only for current-goal coordination updates required by the shared agent workflow
  • PROGRESS.md
  • docs/V1IMPLEMENTATIONPLAN.md only if sequencing or constraints need adjustment
  • docs/Sprints/SPRINT1CONTRACT.md

Not allowed:

  • src/pdf2md/conversion.py
  • src/pdf2md/mineru_adapter.py
  • src/pdf2md/paths.py
  • src/pdf2md/ir.py
  • src/pdf2md/markdown.py
  • src/pdf2md/metadata.py
  • src/pdf2md/quality.py
  • src/pdf2md/report.py
  • src/pdf2md/doctor.py
  • scripts/
  • Any real MinerU invocation
  • Any model download or install script
  • Any committed file under samples/

Expected Outputs

Sprint 1 should produce:

  1. Project package scaffold

    • pyproject.toml with project metadata.
    • Python requirement constrained to Python 3.12.
    • Build configuration suitable for a src/ layout.
    • uv.lock generated by uv sync.
    • .gitignore entries for local virtual environments, pytest cache, and Python bytecode.
    • Minimal test dependency configuration.
    • CLI entry point name reserved as pdf2md.
  2. Minimal source package

    • src/pdf2md/__init__.py.
    • A stable package import surface.
    • Optional minimal src/pdf2md/cli.py placeholder that exits clearly and does not imply conversion is implemented.
  3. Fast test loop

    • A minimal test suite that verifies the package imports.
    • If a CLI placeholder is added, a smoke test that verifies the CLI entry point is wired without invoking conversion.
    • Tests must not require MinerU, CUDA, GPU, model files, samples/, or network.
  4. Developer workflow

    • uv sync should work when uv is installed.
    • uv run pytest should work when uv is installed.
    • If uv is still missing locally, record the failure explicitly in PROGRESS.md and do not mark Sprint 1 complete.
  5. Handoff

    • PROGRESS.md records changed files, commands run, tests passed or blocked, known failures, residual risks, and next action.

Non-Goals

  • Do not implement PDF discovery.
  • Do not implement conversion orchestration.
  • Do not implement the MinerU adapter.
  • Do not run MinerU.
  • Do not install MinerU 3.1.0.
  • Do not download MinerU models.
  • Do not implement Markdown normalization.
  • Do not implement metadata JSON or .report.md output.
  • Do not implement pdf2md doctor; a CLI placeholder may mention future commands, but it must not create a doctor module.
  • Do not add runtime engine selection.
  • Do not add alternate conversion engines.
  • Do not add cloud, remote API, router, HTTP client backend, or remote OpenAI-compatible backend support.

Work Packages

WP1.1: Scaffold Metadata

Owner:

  • feature-generator-agent

Actions:

  • Create the minimal pyproject.toml.
  • Use Python 3.12 constraints.
  • Configure a src/ package layout.
  • Configure pytest as the fast local test runner.
  • Reserve the pdf2md console script.

Output:

  • A minimal, maintainable scaffold without speculative dependencies.

WP1.2: Package Import Surface

Owner:

  • feature-generator-agent

Actions:

  • Create src/pdf2md/__init__.py.
  • Expose only a minimal version/import surface.
  • Avoid public API promises beyond what Sprint 1 verifies.

Output:

  • import pdf2md succeeds.

WP1.3: CLI Placeholder

Owner:

  • feature-generator-agent

Actions:

  • If needed for console script verification, create src/pdf2md/cli.py.
  • The placeholder may expose a help message or a clear "not implemented yet" command.
  • It must not create conversion flags beyond the reserved command shape unless tests need them.

Output:

  • pdf2md entry point is wired without implying conversion works.

WP1.4: Fast Tests

Owner:

  • feature-generator-agent
  • evaluation-agent

Actions:

  • Add minimal tests for package import and optional CLI placeholder behavior.
  • Ensure tests are local, fast, and independent of MinerU/model/GPU/network state.

Output:

  • uv run pytest passes when uv is available.

WP1.5: Independent Evaluation

Owner:

  • evaluation-agent

Actions:

  • Review the completed scaffold against this contract.
  • Verify no converter implementation was added.
  • Verify samples/ remains untracked and unstaged.
  • Verify no runtime remote path or alternate engine was introduced.

Output:

  • PASS/FAIL notes with any missing acceptance criteria.

Verification Checks

Required:

  • git status --short before staging confirms samples/ remains untracked.
  • uv --version is run and result is recorded.
  • uv sync passes if uv is available.
  • uv run pytest passes if uv is available.
  • If uv is unavailable, Sprint 1 is marked blocked rather than complete.
  • Import test passes through the configured test command.
  • No real MinerU dependency is required for default tests.
  • No model downloads occur.
  • No network calls are required.
  • No candidate engine comparison is reintroduced.
  • No conversion behavior is implemented.
  • git diff --check passes.

Recommended:

  • Keep pyproject.toml dependency list minimal.
  • Avoid adding README content beyond setup/test instructions needed for the scaffold.
  • Use requirements-guard-agent to check document consistency if the scaffold reveals a sequencing issue.

Hard Failure Criteria

Sprint 1 fails and must stop for a user decision if any of these are true:

  • uv remains unavailable and the user has not approved bootstrap handling.
  • The project cannot be installed as a Python 3.12 package.
  • The package cannot be imported as pdf2md.
  • Default tests require MinerU, model downloads, GPU access, sample PDFs, or network access.
  • The scaffold introduces conversion logic outside Sprint 1 scope.
  • The scaffold introduces alternate engines or runtime engine selection.
  • The scaffold introduces --api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
  • samples/ is staged or committed.

Acceptance Criteria

Sprint 1 is complete when:

  • pyproject.toml exists and defines a minimal Python 3.12 uv project.
  • src/pdf2md/__init__.py exists and import pdf2md works through the project environment.
  • uv sync passes.
  • uv run pytest passes.
  • The pdf2md CLI entry point is reserved and does not imply conversion is implemented.
  • No converter implementation code beyond the allowed placeholder exists.
  • No default test depends on MinerU, GPU, model files, network, or samples/.
  • PROGRESS.md records checks performed and residual risks.
  • Independent evaluation is complete.
  • The completed change is committed.

Handoff Fields

Use these fields when Sprint 1 completes:

  • Files changed:
  • Commands run:
  • Tests passed:
  • Tests blocked:
  • Known failures:
  • Residual risks:
  • User decisions needed:
  • Go/no-go recommendation for Sprint 2:
  • Next action: