15 KiB
Sprint 7 Contract: Conversion Orchestrator, CLI, And Python API
Status: Implemented Last updated: 2026-05-08
Objective
Connect the existing project-owned boundaries into a working conversion orchestration layer, public Python API, and pdf2md convert CLI path.
Sprint 7 must establish:
- A public
convert_pdfAPI for one local PDF. - A batch conversion API or helper for directory inputs.
- A
pdf2md convert INPUT --out OUTPUT_DIRcommand. - Product behavior that writes Markdown, optional metadata JSON, and
<stem>.report.md. - Local asset materialization for adapter-provided asset files.
- CLI summaries that surface success, failure, and warning counts.
- Fast tests that use fake adapter outputs and do not require real MinerU, model files, GPU, sample PDFs, network, Obsidian, or LaTeX tooling.
Sprint 7 is an orchestration sprint. It may call the real MinerUAdapter in the normal production path, but the default test suite must use injected fake adapters and must not execute MinerU.
Current Precondition
Sprint 6 is complete:
src/pdf2md/paths.pyowns input discovery and output path planning.src/pdf2md/ir.pyowns project records, block types, warning codes, and warning severities.src/pdf2md/metadata.pybuilds JSON-serializable metadata and summary counts from project-owned records.src/pdf2md/mineru_adapter.pyowns the mocked direct local MinerU CLI adapter boundary.src/pdf2md/markdown.pyowns Obsidian Markdown normalization, asset link warnings, and table fallback warnings.src/pdf2md/quality.pyowns local quality checks over normalized Markdown and asset context.src/pdf2md/report.pyowns report content rendering and final status calculation.uv run pytestpassed 103 tests.
Sprint 7 may compute source SHA-256, create conversion output directories, copy local adapter-provided asset files, write final Markdown, write metadata JSON when requested, and write <stem>.report.md. It must keep public return types project-owned and must not require raw MinerU-specific Python objects from callers.
Touched Surfaces
Allowed:
src/pdf2md/conversion.pysrc/pdf2md/cli.pysrc/pdf2md/__init__.pysrc/pdf2md/paths.pyonly for narrowly required path/output helper compatibilitysrc/pdf2md/mineru_adapter.pyonly for narrowly required adapter protocol or result compatibilitysrc/pdf2md/metadata.pyonly for narrowly required output-location or summary compatibilitysrc/pdf2md/markdown.pyonly for narrowly required orchestration compatibilitysrc/pdf2md/quality.pyonly for narrowly required orchestration compatibilitysrc/pdf2md/report.pyonly for narrowly required orchestration compatibilitytests/test_conversion.pytests/test_cli.py- Existing focused unit tests only if a touched module requires compatibility updates
README.mdonly if a short usage note is needed forpdf2md convertPLAN.mdPROGRESS.mddocs/V1IMPLEMENTATIONPLAN.mddocs/Sprints/SPRINT7CONTRACT.md
Not allowed:
src/pdf2md/doctor.py- Working
pdf2md doctorbehavior scripts/- MinerU/model installation or download scripts
- Real MinerU invocation in default tests
- Real GPU/CUDA checks
- Real PDF content parsing outside adapter output handling
- Runtime engine selection or alternate engine support
- Cloud OCR, remote LLM/VLM, hosted renderer, remote document parser, remote asset fetching, HTTP client backend, router mode,
--api-url, remote APIs, or remote OpenAI-compatible backend support - A CLI flag that disables strict-local policy
- Committed files under
samples/
Expected Outputs
Sprint 7 should produce:
-
Public conversion records and API
- A project-owned conversion result type containing at least:
- source PDF path
- Markdown output path
- metadata JSON path when written
- report path
- assets directory
- raw output directory when kept
- engine name and version
- final status
- warning count
- warnings
convert_pdf(input_path, output_dir, metadata=True, keep_raw=False, overwrite=False, gpu=None, strict_local=True, adapter=None, clock=None)or an equivalently small API.- The default API path uses the direct local MinerU adapter.
- Tests can inject a fake adapter and deterministic clock.
- The public return type must not expose raw MinerU-specific Python objects as required fields.
- A project-owned conversion result type containing at least:
-
Single-PDF orchestration
- Discover and plan the single PDF using existing path helpers.
- Create required output directories only after preflight path checks pass.
- Run the adapter into a planned temporary or raw work directory.
- Stop the individual conversion on adapter hard failure and return explicit warnings/status.
- Normalize adapter Markdown into Obsidian-friendly Markdown.
- Copy local adapter-provided asset files into the planned assets directory when needed.
- Compute source SHA-256 with local file reads.
- Build metadata from project-owned records.
- Run local quality checks over normalized Markdown and asset context.
- Render report Markdown from metadata and quality results.
- Write final Markdown, optional metadata JSON, and report Markdown.
-
Output writing and overwrite behavior
- Never write outside the planned output root.
- Respect existing-output conflicts unless
overwrite=True. - Keep writes deterministic and UTF-8 encoded for text outputs.
- Preserve
--metadatabehavior: metadata JSON is written when enabled and omitted when disabled. - Always write
<stem>.report.md. --keep-rawpreserves raw MinerU output in the planned raw directory.- Without
--keep-raw, temporary raw work must be cleaned up when the conversion completes, while preserving enough failure context in metadata/report/warnings.
-
Asset materialization
- Copy only local files returned by the adapter.
- Do not fetch remote assets.
- Do not follow asset paths that escape the adapter work directory or the planned output root.
- Handle missing or invalid adapter asset paths with project-owned warnings.
- Normalize final Markdown asset links to stable relative paths.
-
CLI convert command
pdf2md convert INPUT --out OUTPUT_DIR.- Options:
--metadata--keep-raw--recursive--overwrite--gpu GPU_DEVICE--strict-local
- Strict-local must remain enabled in v1; the CLI must not add a supported way to disable it.
- Single PDF conversion returns exit code
0on success or partial success, and non-zero when a hard error prevents conversion. - Directory conversion handles multiple PDFs deterministically and prints a summary.
- Batch conversion should continue to the next file when one PDF fails after planning, then return a non-zero exit code if any PDF failed.
- CLI output must include converted count, failed count, and warning count.
-
Batch conversion
- Directory inputs use existing non-recursive discovery by default.
- Recursive discovery occurs only with
--recursive. - Output paths preserve relative subdirectories from the input root.
- Duplicate planned outputs and overwrite conflicts fail before conversion starts.
- Results are deterministic and ordered by discovery/path planning order.
-
Failure and warning behavior
- MinerU failure must be clear and must not trigger fallback to any other engine.
- Strict-local violations must be hard failures.
- Per-file failures must include project-owned warnings.
- CLI summaries must not suppress warning counts.
- Metadata/report content must reflect warnings emitted during adapter, normalization, asset, and quality steps.
-
Tests
- API test for one successful conversion with a fake adapter.
- API test for adapter failure with no fallback.
- API test for output conflict and overwrite behavior.
- API test for metadata disabled behavior.
- API test for local asset copying and relative Markdown links.
- API test for
keep_rawbehavior. - CLI test for single PDF conversion with a fake adapter.
- CLI test for directory conversion with deterministic summary output.
- CLI test for recursive behavior.
- CLI test for failure summary and non-zero exit code.
- Tests proving default checks do not require real MinerU, GPU, models, network,
samples/, Obsidian, or LaTeX tooling.
-
Handoff
PROGRESS.mdrecords changed files, commands run, tests passed or blocked, known failures, residual risks, and next action.
Non-Goals
- Do not implement
pdf2md doctor. - Do not implement environment diagnostics.
- Do not install MinerU 3.1.0.
- Do not download MinerU models.
- Do not probe real MinerU output with local sample PDFs.
- Do not add setup scripts.
- Do not implement runtime engine selection.
- Do not add alternate engines.
- Do not add cloud, remote API, router, HTTP client backend, remote OpenAI-compatible backend, hosted renderer, or remote asset-fetching support.
- Do not add a CLI flag or API option that disables strict-local policy.
- Do not require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/in default tests. - Do not implement real math rendering; use the Sprint 6 local checker boundary.
- Do not commit generated conversion outputs or sample PDFs.
Work Packages
WP7.1: Public API And Result Records
Owner:
feature-generator-agentrequirements-guard-agent
Actions:
- Add
conversion.py. - Define project-owned conversion result records.
- Expose
convert_pdffrom the library surface. - Support fake adapter and deterministic clock injection for tests.
Output:
- Callers can run one conversion without depending on raw MinerU objects.
WP7.2: Single-PDF Orchestration And Output Writing
Owner:
feature-generator-agentmineru-integration-agentmetadata-agentobsidian-markdown-agent
Actions:
- Connect path planning, adapter execution, Markdown normalization, metadata building, quality checks, and report rendering.
- Write final Markdown, optional metadata JSON, and report Markdown.
- Compute source SHA-256.
- Preserve strict-local behavior and no-fallback behavior.
Output:
- One PDF can be converted through mocked adapter outputs in tests and through the real adapter in normal use.
WP7.3: Asset And Raw Output Handling
Owner:
feature-generator-agentobsidian-markdown-agentmetadata-agent
Actions:
- Copy local adapter-provided assets into the planned assets directory.
- Normalize Markdown links relative to the final Markdown file.
- Preserve raw output only when requested.
- Clean temporary work when raw output is not requested.
Output:
- Markdown, assets, metadata, and report paths are stable and local-only.
WP7.4: CLI Convert And Batch Summary
Owner:
feature-generator-agentrequirements-guard-agent
Actions:
- Replace the placeholder CLI with
convertwhile keeping--version. - Add only the agreed v1 options.
- Print deterministic summaries with converted, failed, and warning counts.
- Return non-zero exit code when hard failures occur.
Output:
- Users can run
pdf2md convertfor one PDF or a directory.
WP7.5: Independent Evaluation
Owner:
evaluation-agent
Actions:
- Review completed orchestration behavior against this contract.
- Verify no default test executes real MinerU, uses GPU, downloads models, uses network, or requires
samples/. - Verify no runtime remote/API path or alternate engine is introduced.
- Verify
samples/remains untracked and unstaged.
Output:
- PASS/FAIL notes with any missing acceptance criteria.
Verification Checks
Required:
git status --short --untracked-files=allbefore staging confirmssamples/remains untracked and unstaged.uv --versionis run and result is recorded.uv syncpasses.uv run pytest tests/test_conversion.py tests/test_cli.pypasses.uv run pytestpasses.git diff --checkpasses.- Default tests do not require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/. - No model downloads occur.
- No network calls are required.
- No candidate engine comparison is reintroduced.
- No alternate engine or runtime engine selection is added.
- No CLI/API option disables strict-local policy.
- No
--api-url, router mode, HTTP client backend, remote API, or remote OpenAI-compatible backend support is added. - Adapter failures produce explicit failed results and no fallback conversion.
- Output files are written only after path preflight succeeds.
- Existing outputs are protected unless overwrite is enabled.
- CLI summaries include warning counts.
- Metadata/report paths and counts match the files written.
Recommended:
- Keep conversion orchestration small and dependency-injected.
- Prefer local temporary directories from the standard library for raw work when
keep_rawis disabled. - Keep batch conversion a thin loop over single-file conversion.
- Keep CLI formatting simple and stable enough for tests.
- Use fake adapter records in tests rather than monkeypatching subprocess behavior at the CLI layer.
Hard Failure Criteria
Sprint 7 fails and must stop for a user decision if any of these are true:
- Public API requires or exposes raw MinerU-specific Python objects as required return fields.
- The implementation silently falls back to another engine after MinerU failure.
- A CLI/API option disables strict-local policy.
- The implementation adds or permits
--api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends. - Default tests execute real MinerU, require GPU/CUDA, download models, use network, require Obsidian/LaTeX tooling, or require
samples/. - Output writing can escape the planned output root.
- Existing files are overwritten without explicit overwrite intent.
- CLI writes final outputs after a preflight hard failure.
- CLI summaries suppress warning counts or failed counts.
- Metadata/report content omits warnings emitted during adapter, normalization, asset, or quality steps.
samples/is staged or committed.
Acceptance Criteria
Sprint 7 is complete when:
src/pdf2md/conversion.pyexists and owns conversion orchestration.convert_pdfis available from the public Python package.pdf2md convert INPUT --out OUTPUT_DIRexists.- Single-PDF conversion is tested with a fake adapter.
- Directory and recursive conversion behavior is tested with fake adapters.
- Output conflict and overwrite behavior is tested.
- Adapter failure produces a clear failed result and no fallback.
- Final Markdown, metadata JSON when enabled, and report Markdown are written by product behavior.
- Local adapter-provided assets are copied or warned about deterministically.
--keep-rawbehavior is tested.- CLI summaries include converted, failed, and warning counts.
- Default tests do not require real MinerU, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/. uv syncpasses.- Targeted conversion/CLI tests pass.
uv run pytestpasses.PROGRESS.mdrecords checks performed and residual risks.- Independent evaluation is complete.
- The completed change is committed.
Handoff Fields
Use these fields when Sprint 7 completes:
- Files changed:
- Commands run:
- Tests passed:
- Tests blocked:
- Known failures:
- Residual risks:
- User decisions needed:
- Go/no-go recommendation for Sprint 8:
- Next action: