Files
2026-05-08 16:42:19 +09:00

15 KiB

Sprint 7 Contract: Conversion Orchestrator, CLI, And Python API

Status: Implemented Last updated: 2026-05-08

Objective

Connect the existing project-owned boundaries into a working conversion orchestration layer, public Python API, and pdf2md convert CLI path.

Sprint 7 must establish:

  • A public convert_pdf API for one local PDF.
  • A batch conversion API or helper for directory inputs.
  • A pdf2md convert INPUT --out OUTPUT_DIR command.
  • Product behavior that writes Markdown, optional metadata JSON, and <stem>.report.md.
  • Local asset materialization for adapter-provided asset files.
  • CLI summaries that surface success, failure, and warning counts.
  • Fast tests that use fake adapter outputs and do not require real MinerU, model files, GPU, sample PDFs, network, Obsidian, or LaTeX tooling.

Sprint 7 is an orchestration sprint. It may call the real MinerUAdapter in the normal production path, but the default test suite must use injected fake adapters and must not execute MinerU.

Current Precondition

Sprint 6 is complete:

  • src/pdf2md/paths.py owns input discovery and output path planning.
  • src/pdf2md/ir.py owns project records, block types, warning codes, and warning severities.
  • src/pdf2md/metadata.py builds JSON-serializable metadata and summary counts from project-owned records.
  • src/pdf2md/mineru_adapter.py owns the mocked direct local MinerU CLI adapter boundary.
  • src/pdf2md/markdown.py owns Obsidian Markdown normalization, asset link warnings, and table fallback warnings.
  • src/pdf2md/quality.py owns local quality checks over normalized Markdown and asset context.
  • src/pdf2md/report.py owns report content rendering and final status calculation.
  • uv run pytest passed 103 tests.

Sprint 7 may compute source SHA-256, create conversion output directories, copy local adapter-provided asset files, write final Markdown, write metadata JSON when requested, and write <stem>.report.md. It must keep public return types project-owned and must not require raw MinerU-specific Python objects from callers.

Touched Surfaces

Allowed:

  • src/pdf2md/conversion.py
  • src/pdf2md/cli.py
  • src/pdf2md/__init__.py
  • src/pdf2md/paths.py only for narrowly required path/output helper compatibility
  • src/pdf2md/mineru_adapter.py only for narrowly required adapter protocol or result compatibility
  • src/pdf2md/metadata.py only for narrowly required output-location or summary compatibility
  • src/pdf2md/markdown.py only for narrowly required orchestration compatibility
  • src/pdf2md/quality.py only for narrowly required orchestration compatibility
  • src/pdf2md/report.py only for narrowly required orchestration compatibility
  • tests/test_conversion.py
  • tests/test_cli.py
  • Existing focused unit tests only if a touched module requires compatibility updates
  • README.md only if a short usage note is needed for pdf2md convert
  • PLAN.md
  • PROGRESS.md
  • docs/V1IMPLEMENTATIONPLAN.md
  • docs/Sprints/SPRINT7CONTRACT.md

Not allowed:

  • src/pdf2md/doctor.py
  • Working pdf2md doctor behavior
  • scripts/
  • MinerU/model installation or download scripts
  • Real MinerU invocation in default tests
  • Real GPU/CUDA checks
  • Real PDF content parsing outside adapter output handling
  • Runtime engine selection or alternate engine support
  • Cloud OCR, remote LLM/VLM, hosted renderer, remote document parser, remote asset fetching, HTTP client backend, router mode, --api-url, remote APIs, or remote OpenAI-compatible backend support
  • A CLI flag that disables strict-local policy
  • Committed files under samples/

Expected Outputs

Sprint 7 should produce:

  1. Public conversion records and API

    • A project-owned conversion result type containing at least:
      • source PDF path
      • Markdown output path
      • metadata JSON path when written
      • report path
      • assets directory
      • raw output directory when kept
      • engine name and version
      • final status
      • warning count
      • warnings
    • convert_pdf(input_path, output_dir, metadata=True, keep_raw=False, overwrite=False, gpu=None, strict_local=True, adapter=None, clock=None) or an equivalently small API.
    • The default API path uses the direct local MinerU adapter.
    • Tests can inject a fake adapter and deterministic clock.
    • The public return type must not expose raw MinerU-specific Python objects as required fields.
  2. Single-PDF orchestration

    • Discover and plan the single PDF using existing path helpers.
    • Create required output directories only after preflight path checks pass.
    • Run the adapter into a planned temporary or raw work directory.
    • Stop the individual conversion on adapter hard failure and return explicit warnings/status.
    • Normalize adapter Markdown into Obsidian-friendly Markdown.
    • Copy local adapter-provided asset files into the planned assets directory when needed.
    • Compute source SHA-256 with local file reads.
    • Build metadata from project-owned records.
    • Run local quality checks over normalized Markdown and asset context.
    • Render report Markdown from metadata and quality results.
    • Write final Markdown, optional metadata JSON, and report Markdown.
  3. Output writing and overwrite behavior

    • Never write outside the planned output root.
    • Respect existing-output conflicts unless overwrite=True.
    • Keep writes deterministic and UTF-8 encoded for text outputs.
    • Preserve --metadata behavior: metadata JSON is written when enabled and omitted when disabled.
    • Always write <stem>.report.md.
    • --keep-raw preserves raw MinerU output in the planned raw directory.
    • Without --keep-raw, temporary raw work must be cleaned up when the conversion completes, while preserving enough failure context in metadata/report/warnings.
  4. Asset materialization

    • Copy only local files returned by the adapter.
    • Do not fetch remote assets.
    • Do not follow asset paths that escape the adapter work directory or the planned output root.
    • Handle missing or invalid adapter asset paths with project-owned warnings.
    • Normalize final Markdown asset links to stable relative paths.
  5. CLI convert command

    • pdf2md convert INPUT --out OUTPUT_DIR.
    • Options:
      • --metadata
      • --keep-raw
      • --recursive
      • --overwrite
      • --gpu GPU_DEVICE
      • --strict-local
    • Strict-local must remain enabled in v1; the CLI must not add a supported way to disable it.
    • Single PDF conversion returns exit code 0 on success or partial success, and non-zero when a hard error prevents conversion.
    • Directory conversion handles multiple PDFs deterministically and prints a summary.
    • Batch conversion should continue to the next file when one PDF fails after planning, then return a non-zero exit code if any PDF failed.
    • CLI output must include converted count, failed count, and warning count.
  6. Batch conversion

    • Directory inputs use existing non-recursive discovery by default.
    • Recursive discovery occurs only with --recursive.
    • Output paths preserve relative subdirectories from the input root.
    • Duplicate planned outputs and overwrite conflicts fail before conversion starts.
    • Results are deterministic and ordered by discovery/path planning order.
  7. Failure and warning behavior

    • MinerU failure must be clear and must not trigger fallback to any other engine.
    • Strict-local violations must be hard failures.
    • Per-file failures must include project-owned warnings.
    • CLI summaries must not suppress warning counts.
    • Metadata/report content must reflect warnings emitted during adapter, normalization, asset, and quality steps.
  8. Tests

    • API test for one successful conversion with a fake adapter.
    • API test for adapter failure with no fallback.
    • API test for output conflict and overwrite behavior.
    • API test for metadata disabled behavior.
    • API test for local asset copying and relative Markdown links.
    • API test for keep_raw behavior.
    • CLI test for single PDF conversion with a fake adapter.
    • CLI test for directory conversion with deterministic summary output.
    • CLI test for recursive behavior.
    • CLI test for failure summary and non-zero exit code.
    • Tests proving default checks do not require real MinerU, GPU, models, network, samples/, Obsidian, or LaTeX tooling.
  9. Handoff

    • PROGRESS.md records changed files, commands run, tests passed or blocked, known failures, residual risks, and next action.

Non-Goals

  • Do not implement pdf2md doctor.
  • Do not implement environment diagnostics.
  • Do not install MinerU 3.1.0.
  • Do not download MinerU models.
  • Do not probe real MinerU output with local sample PDFs.
  • Do not add setup scripts.
  • Do not implement runtime engine selection.
  • Do not add alternate engines.
  • Do not add cloud, remote API, router, HTTP client backend, remote OpenAI-compatible backend, hosted renderer, or remote asset-fetching support.
  • Do not add a CLI flag or API option that disables strict-local policy.
  • Do not require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or samples/ in default tests.
  • Do not implement real math rendering; use the Sprint 6 local checker boundary.
  • Do not commit generated conversion outputs or sample PDFs.

Work Packages

WP7.1: Public API And Result Records

Owner:

  • feature-generator-agent
  • requirements-guard-agent

Actions:

  • Add conversion.py.
  • Define project-owned conversion result records.
  • Expose convert_pdf from the library surface.
  • Support fake adapter and deterministic clock injection for tests.

Output:

  • Callers can run one conversion without depending on raw MinerU objects.

WP7.2: Single-PDF Orchestration And Output Writing

Owner:

  • feature-generator-agent
  • mineru-integration-agent
  • metadata-agent
  • obsidian-markdown-agent

Actions:

  • Connect path planning, adapter execution, Markdown normalization, metadata building, quality checks, and report rendering.
  • Write final Markdown, optional metadata JSON, and report Markdown.
  • Compute source SHA-256.
  • Preserve strict-local behavior and no-fallback behavior.

Output:

  • One PDF can be converted through mocked adapter outputs in tests and through the real adapter in normal use.

WP7.3: Asset And Raw Output Handling

Owner:

  • feature-generator-agent
  • obsidian-markdown-agent
  • metadata-agent

Actions:

  • Copy local adapter-provided assets into the planned assets directory.
  • Normalize Markdown links relative to the final Markdown file.
  • Preserve raw output only when requested.
  • Clean temporary work when raw output is not requested.

Output:

  • Markdown, assets, metadata, and report paths are stable and local-only.

WP7.4: CLI Convert And Batch Summary

Owner:

  • feature-generator-agent
  • requirements-guard-agent

Actions:

  • Replace the placeholder CLI with convert while keeping --version.
  • Add only the agreed v1 options.
  • Print deterministic summaries with converted, failed, and warning counts.
  • Return non-zero exit code when hard failures occur.

Output:

  • Users can run pdf2md convert for one PDF or a directory.

WP7.5: Independent Evaluation

Owner:

  • evaluation-agent

Actions:

  • Review completed orchestration behavior against this contract.
  • Verify no default test executes real MinerU, uses GPU, downloads models, uses network, or requires samples/.
  • Verify no runtime remote/API path or alternate engine is introduced.
  • Verify samples/ remains untracked and unstaged.

Output:

  • PASS/FAIL notes with any missing acceptance criteria.

Verification Checks

Required:

  • git status --short --untracked-files=all before staging confirms samples/ remains untracked and unstaged.
  • uv --version is run and result is recorded.
  • uv sync passes.
  • uv run pytest tests/test_conversion.py tests/test_cli.py passes.
  • uv run pytest passes.
  • git diff --check passes.
  • Default tests do not require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or samples/.
  • No model downloads occur.
  • No network calls are required.
  • No candidate engine comparison is reintroduced.
  • No alternate engine or runtime engine selection is added.
  • No CLI/API option disables strict-local policy.
  • No --api-url, router mode, HTTP client backend, remote API, or remote OpenAI-compatible backend support is added.
  • Adapter failures produce explicit failed results and no fallback conversion.
  • Output files are written only after path preflight succeeds.
  • Existing outputs are protected unless overwrite is enabled.
  • CLI summaries include warning counts.
  • Metadata/report paths and counts match the files written.

Recommended:

  • Keep conversion orchestration small and dependency-injected.
  • Prefer local temporary directories from the standard library for raw work when keep_raw is disabled.
  • Keep batch conversion a thin loop over single-file conversion.
  • Keep CLI formatting simple and stable enough for tests.
  • Use fake adapter records in tests rather than monkeypatching subprocess behavior at the CLI layer.

Hard Failure Criteria

Sprint 7 fails and must stop for a user decision if any of these are true:

  • Public API requires or exposes raw MinerU-specific Python objects as required return fields.
  • The implementation silently falls back to another engine after MinerU failure.
  • A CLI/API option disables strict-local policy.
  • The implementation adds or permits --api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
  • Default tests execute real MinerU, require GPU/CUDA, download models, use network, require Obsidian/LaTeX tooling, or require samples/.
  • Output writing can escape the planned output root.
  • Existing files are overwritten without explicit overwrite intent.
  • CLI writes final outputs after a preflight hard failure.
  • CLI summaries suppress warning counts or failed counts.
  • Metadata/report content omits warnings emitted during adapter, normalization, asset, or quality steps.
  • samples/ is staged or committed.

Acceptance Criteria

Sprint 7 is complete when:

  • src/pdf2md/conversion.py exists and owns conversion orchestration.
  • convert_pdf is available from the public Python package.
  • pdf2md convert INPUT --out OUTPUT_DIR exists.
  • Single-PDF conversion is tested with a fake adapter.
  • Directory and recursive conversion behavior is tested with fake adapters.
  • Output conflict and overwrite behavior is tested.
  • Adapter failure produces a clear failed result and no fallback.
  • Final Markdown, metadata JSON when enabled, and report Markdown are written by product behavior.
  • Local adapter-provided assets are copied or warned about deterministically.
  • --keep-raw behavior is tested.
  • CLI summaries include converted, failed, and warning counts.
  • Default tests do not require real MinerU, GPU, model files, network, Obsidian, LaTeX tooling, or samples/.
  • uv sync passes.
  • Targeted conversion/CLI tests pass.
  • uv run pytest passes.
  • PROGRESS.md records checks performed and residual risks.
  • Independent evaluation is complete.
  • The completed change is committed.

Handoff Fields

Use these fields when Sprint 7 completes:

  • Files changed:
  • Commands run:
  • Tests passed:
  • Tests blocked:
  • Known failures:
  • Residual risks:
  • User decisions needed:
  • Go/no-go recommendation for Sprint 8:
  • Next action: