PDFToMD/docs/Sprints/SPRINT15CONTRACT.md

# Sprint 15 Contract: NVIDIA GPU Detection And Auto MinerU Profile

Status: Implemented
Last updated: 2026-05-12

## Objective

Add a strict-local runtime profiling layer that detects installed NVIDIA GPUs and applies conservative MinerU environment tuning by default.

The default runtime profile is `auto`. In `auto`, the converter should keep 8GB and pre-Turing GPUs conservative, while allowing a slightly more aggressive local MinerU configuration only when the selected NVIDIA GPU has at least 16GB VRAM and no pre-Turing compatibility warning.

This sprint is motivated by local evidence from `samples\FourNodeQuadrilateralShellElementMITC4.pdf`: Sprint 14's one-page conversion path used `cuda:0` correctly, but GTX 1070 Ti 8GB stayed near full VRAM use and stalled on source page 2. The next useful test should be on a stronger NVIDIA GPU with explicit runtime diagnostics and reproducible MinerU environment settings.

## Source Basis

Use these source-backed facts during implementation:

- MinerU CLI supports `mineru -p <input_path> -o <output_path>` and, without `--api-url`, launches a temporary local `mineru-api`: https://opendatalab.github.io/MinerU/usage/cli_tools/
- MinerU CLI documents `-b/--backend`, `-f/--formula`, `-t/--table`, `--api-url`, and related options, but this project must not expose remote/API or backend selection paths in v1: https://opendatalab.github.io/MinerU/usage/cli_tools/
- MinerU environment variables include `MINERU_PDF_RENDER_THREADS`, `MINERU_PROCESSING_WINDOW_SIZE`, `MINERU_API_MAX_CONCURRENT_REQUESTS`, and timeout settings: https://opendatalab.github.io/MinerU/usage/cli_tools/
- MinerU advanced CLI docs support selecting visible GPU devices with `CUDA_VISIBLE_DEVICES`: https://opendatalab.github.io/MinerU/usage/advanced_cli_parameters/
- MinerU local deployment docs list auto-engine GPU requirements around 8GB+ VRAM and GPU acceleration for Volta-or-later devices: https://opendatalab.github.io/MinerU/quick_start/
- MinerU extension docs say `vllm` and `lmdeploy` acceleration extras are alternatives and should not both be installed just for this sprint: https://opendatalab.github.io/MinerU/quick_start/extension_modules/

Access date for the source review: 2026-05-12.

## Current Precondition

- MinerU 3.1.0 remains the only conversion engine.
- Conversion runs through direct local `mineru` CLI execution only.
- Strict-local allows only the direct CLI and MinerU CLI-internal temporary local `mineru-api`; remote API/backend paths remain prohibited.
- `pdf2md convert` defaults to `--gpu cuda:0`.
- `MinerUAdapter` currently maps `cuda:N` to `MINERU_DEVICE_MODE=cuda` and `CUDA_VISIBLE_DEVICES=N`.
- `pdf2md doctor` already reports NVIDIA GPU visibility, PyTorch CUDA visibility, GPU names, and Pascal/pre-Turing warnings.
- Sprint 14 chunk mode runs one source page per MinerU invocation when `--chunk-pages` is active.

## Contract Assumptions

- Keep `--gpu cuda:0` as the default for backward compatibility with PRD and existing docs.
- Add `--gpu auto` as an opt-in GPU selection mode that chooses the visible NVIDIA GPU with the largest reported VRAM.
- Add `--mineru-profile {auto,safe,performance}` with default `auto`.
- Keep all conversion requests sequential in Sprint 15. Do not introduce parallel page conversion.
- Keep formula and table parsing enabled. Do not optimize by disabling required output quality features.
- Do not add `--backend`, `--api-url`, `--url`, router mode, HTTP client backend, remote OpenAI-compatible backend, or remote model server support.
- Treat MinerU environment tuning as best-effort. If GPU inventory cannot be read, continue with safe profile settings and a warning/provenance record rather than guessing aggressive values.

## Touched Surfaces

Allowed during implementation:

- Create `src/pdf2md/gpu.py`
- Create `src/pdf2md/mineru_profile.py`
- Modify `src/pdf2md/mineru_adapter.py`
- Modify `src/pdf2md/conversion.py`
- Modify `src/pdf2md/cli.py`
- Modify `src/pdf2md/doctor.py`
- Modify `src/pdf2md_ui/runner.py` only if the UI command builder needs profile passthrough
- Modify `src/pdf2md_ui/app.py` only if a minimal profile control is necessary
- Add `tests/test_gpu.py`
- Add `tests/test_mineru_profile.py`
- Modify `tests/test_mineru_adapter.py`
- Modify `tests/test_conversion.py`
- Modify `tests/test_cli.py`
- Modify `tests/test_doctor.py`
- Modify `tests/test_ui_runner.py` only if UI command construction changes
- Modify `README.md`
- Modify `ARCHITECTURE.md`
- Modify `PRD.md` if CLI option documentation changes
- Modify `docs/V1IMPLEMENTATIONPLAN.md`
- Modify `PLAN.md`
- Modify `PROGRESS.md`
- Modify `docs/WORKARCHIVE.md` after implementation

Not allowed:

- Adding another conversion engine or runtime engine selector.
- Passing `--api-url`, `--url`, or any remote endpoint to MinerU.
- Adding `mineru-router`, HTTP client backend, or OpenAI-compatible backend usage.
- Installing `vllm`, `lmdeploy`, CUDA packages, models, or any runtime package automatically.
- Changing the default conversion engine or disabling formula/table recognition.
- Making default tests depend on real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, MathJax, or `samples/`.
- Committing sample PDFs, generated `outputs/`, retained temporary page outputs, local model files, or `dist/pdf2md-ui.exe`.

## Product Behavior

### CLI

Existing behavior remains valid:

```powershell
uv run pdf2md convert paper.pdf --out outputs
uv run pdf2md convert paper.pdf --out outputs --gpu cuda:0
```

New behavior:

```powershell
uv run pdf2md convert paper.pdf --out outputs --mineru-profile auto
uv run pdf2md convert paper.pdf --out outputs --mineru-profile safe
uv run pdf2md convert paper.pdf --out outputs --mineru-profile performance
uv run pdf2md convert paper.pdf --out outputs --gpu auto --mineru-profile auto
```

Rules:

- `--mineru-profile` defaults to `auto`.
- `--gpu cuda:N` selects a concrete CUDA index and tunes MinerU for that selected GPU when inventory is available.
- `--gpu N` is still normalized to `cuda:N`.
- `--gpu auto` selects the visible NVIDIA GPU with the largest VRAM from local GPU inventory.
- If `--gpu auto` cannot find a visible NVIDIA GPU, fail clearly before conversion rather than silently switching to CPU.
- If `--mineru-profile performance` is requested on a selected GPU below 16GB VRAM or with pre-Turing risk, downgrade to safe settings with a warning in metadata/report. Do not fail solely because performance was unsafe.

### Doctor

`pdf2md doctor` should report:

- All visible NVIDIA GPUs with index, name, total VRAM, and driver version from `nvidia-smi`.
- PyTorch CUDA device names and compute capabilities when available.
- Selected default GPU recommendation for `--gpu auto`.
- Recommended MinerU profile for the detected primary GPU.
- Existing Pascal/pre-Turing warnings.

Doctor must not require a real conversion, model load, network access, or package download.

### Auto Profile Policy

Use a small deterministic policy table. Values are intentionally conservative because the converter runs real PDFs and should prefer completion over peak throughput.

| Selected GPU | Auto policy | MinerU environment |
| --- | --- | --- |
| No GPU inventory, CUDA requested | Safe fallback with warning | `MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1` |
| Pre-Turing or VRAM < 12GB | Safe | `MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1` |
| 12GB <= VRAM < 16GB | Auto conservative | `MINERU_PROCESSING_WINDOW_SIZE=4`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=2` |
| VRAM >= 16GB and Turing-or-newer | Auto moderately aggressive | `MINERU_PROCESSING_WINDOW_SIZE=8`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=4` |
| Explicit `safe` | Safe regardless of GPU | `MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1` |
| Explicit `performance` on VRAM >= 16GB and Turing-or-newer | Performance | `MINERU_PROCESSING_WINDOW_SIZE=16`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=4` |
| Explicit `performance` on weaker GPU | Downgraded safe with warning | safe values |

Do not set `MINERU_HYBRID_BATCH_RATIO` in Sprint 15 because MinerU docs describe it as commonly used for `hybrid-http-client`, which this project prohibits in v1.

Do not set backend CLI flags in Sprint 15. The default MinerU backend remains MinerU-owned.

## Architecture Plan

### WP15.1: GPU Inventory Boundary

Actions:

- Add `src/pdf2md/gpu.py`.
- Define immutable `GpuInfo` and `GpuInventory` records.
- Parse `nvidia-smi --query-gpu=index,name,memory.total,driver_version --format=csv,noheader,nounits`.
- Parse memory in MiB as an integer.
- Mark pre-Turing risk using the existing name-based heuristic for GTX 10xx and pre-Turing names.
- Optionally enrich compute capability through PyTorch when available, but keep PyTorch optional and mockable.
- Provide `select_gpu(gpus, requested)` for `cuda:N`, `N`, and `auto`.

Expected output:

- GPU detection is independently testable with captured command output strings.
- No real `nvidia-smi`, GPU, or PyTorch is needed in default tests.

### WP15.2: MinerU Profile Policy

Actions:

- Add `src/pdf2md/mineru_profile.py`.
- Define supported profile names: `auto`, `safe`, `performance`.
- Define a result record containing:
  - requested profile,
  - applied profile,
  - selected GPU index if known,
  - selected GPU name if known,
  - selected GPU VRAM MiB if known,
  - environment variables to set,
  - warnings or info messages as project `WarningRecord` values.
- Implement the policy table above.
- Keep profile environment values in a small allowlist.

Expected output:

- The policy can be tested without running MinerU.
- Performance profile cannot silently overcommit weak GPUs.

### WP15.3: Adapter Environment Integration

Actions:

- Extend `MinerUOptions` with `mineru_profile: str = "auto"` and optional resolved profile metadata.
- Keep strict-local validation for every option string.
- Update `_mineru_environment()` to merge:
  - `MINERU_DEVICE_MODE=cuda`,
  - `CUDA_VISIBLE_DEVICES=<selected index>`,
  - profile environment variables from `mineru_profile.py`.
- Preserve previous environment values after subprocess execution.
- Include profile details in `engine_options`.

Expected output:

- Real MinerU still receives only direct local CLI command shape:

```text
mineru -p <input> -o <output>
```

- Tuning is done through local environment variables, not remote/API/backend flags.

### WP15.4: Conversion And CLI Wiring

Actions:

- Add `--mineru-profile` to `pdf2md convert`.
- Accept `--gpu auto`.
- Resolve selected GPU and profile before calling the adapter.
- Surface profile warnings in conversion metadata/report warnings.
- Preserve existing `--gpu cuda:0` default.
- Ensure `convert_pdf()` can receive the profile through the Python API.

Expected output:

- Default conversions use `mineru_profile=auto`.
- Existing calls with no new flags continue to work.
- Metadata explains which profile was applied.

### WP15.5: Doctor Reporting

Actions:

- Reuse `gpu.py` inventory parsing in `doctor.py`.
- Keep the existing `gpu` and `pytorch` checks, but make GPU details more explicit.
- Add a doctor detail line for auto-selected GPU and recommended profile.
- Keep warning-only behavior for Pascal/pre-Turing GPUs.

Expected output:

- On a stronger PC, `pdf2md doctor` shows enough evidence to decide whether `auto` or `performance` is appropriate.
- On the current GTX 1070 Ti, doctor still warns and recommends safe/conservative behavior.

### WP15.6: Documentation

Actions:

- Update README setup and conversion docs with `--mineru-profile`.
- Update ARCHITECTURE to document that tuning uses strict-local environment variables only.
- Update PRD CLI section if the new public flag is added.
- Update `docs/V1IMPLEMENTATIONPLAN.md`, `PLAN.md`, and `PROGRESS.md`.
- Archive implementation details in `docs/WORKARCHIVE.md` only after implementation and verification.

Expected output:

- Users can move the repo to a stronger NVIDIA GPU PC, run `pdf2md doctor`, and understand the selected profile.

## Tests

Default fast tests:

- GPU inventory parser handles one RTX GPU, multiple GPUs, no GPU lines, and malformed memory fields.
- `select_gpu(..., "auto")` selects the largest VRAM GPU.
- `select_gpu(..., "cuda:1")` selects index 1 and errors when absent.
- `select_gpu(..., "1")` normalizes to index 1.
- `auto` profile returns safe values for GTX 1070 Ti 8GB.
- `auto` profile returns moderately aggressive values for an RTX GPU with 16GB or more.
- `performance` profile returns performance values only for 16GB+ Turing-or-newer GPUs.
- `performance` profile on GTX 1070 Ti downgrades to safe and returns a warning.
- Adapter sets and restores `MINERU_DEVICE_MODE`, `CUDA_VISIBLE_DEVICES`, `MINERU_PROCESSING_WINDOW_SIZE`, `MINERU_API_MAX_CONCURRENT_REQUESTS`, and `MINERU_PDF_RENDER_THREADS`.
- Strict-local validation rejects remote/API/backend-like option strings in profile-related fields.
- CLI default passes `mineru_profile=auto`.
- CLI accepts `--mineru-profile safe` and `--mineru-profile performance`.
- CLI rejects invalid profile values.
- Doctor report includes visible GPU details and recommended profile with mocked command outputs.
- Existing conversion, chunking, metadata, report, and UI tests remain green.

Optional local validation on a stronger NVIDIA GPU PC:

```powershell
uv run pdf2md doctor
$env:MINERU_MODEL_SOURCE='local'
uv run pdf2md convert samples\FourNodeQuadrilateralShellElementMITC4.pdf --out outputs\fournode-sprint15-auto --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local
```

Expected optional validation:

- Doctor reports the stronger GPU name, VRAM, and recommended profile.
- Conversion metadata records `mineru_profile` and selected GPU information.
- Generated outputs stay ignored and uncommitted.

## Acceptance Criteria

- `--mineru-profile auto` is the default conversion behavior.
- `auto` uses safe settings on the current GTX 1070 Ti 8GB and stronger settings only on 16GB+ Turing-or-newer NVIDIA GPUs.
- `--gpu auto` can choose the largest visible NVIDIA GPU without adding remote/runtime backend support.
- MinerU command shape remains direct local CLI only.
- Strict-local prohibitions remain enforced.
- `pdf2md doctor` provides actionable GPU/profile information.
- Metadata/report preserve the applied runtime profile.
- Default tests remain fast, mocked, local, and independent of real MinerU/GPU/model files/network/samples.

## Hard Failure Criteria

- Implementation adds runtime backend selection or exposes `--backend`.
- Implementation passes `--api-url`, `--url`, router, HTTP client backend, or remote OpenAI-compatible backend values.
- `auto` profile applies aggressive settings to GTX 1070 Ti 8GB or other pre-Turing/low-VRAM GPUs.
- Existing `--gpu cuda:0` behavior breaks.
- Profile tuning disables formula or table parsing.
- Doctor or tests require real GPU, real MinerU execution, model files, network, Obsidian, MathJax, or `samples/`.
- Sample PDFs, generated outputs, local model files, or `dist/pdf2md-ui.exe` are committed.

## Implementation Task Plan

### Task 1: GPU Inventory

Files:

- Create `src/pdf2md/gpu.py`
- Create `tests/test_gpu.py`

Steps:

- [x] Add failing tests for parsing `nvidia-smi` CSV output.
- [x] Add failing tests for `auto`, `cuda:N`, and numeric GPU selection.
- [x] Implement immutable GPU records and parser helpers.
- [x] Implement selection errors as `ValueError` with clear messages.
- [x] Run `uv run pytest tests/test_gpu.py`.
- [x] Commit GPU inventory boundary.

### Task 2: MinerU Profile Policy

Files:

- Create `src/pdf2md/mineru_profile.py`
- Create `tests/test_mineru_profile.py`

Steps:

- [x] Add failing tests for safe, auto, and performance profile policy.
- [x] Add tests proving 16GB+ Turing-or-newer GPUs get the moderately aggressive auto environment.
- [x] Add tests proving GTX 1070 Ti 8GB stays safe.
- [x] Implement the allowlisted environment mapping.
- [x] Run `uv run pytest tests/test_mineru_profile.py tests/test_gpu.py`.
- [x] Commit profile policy.

### Task 3: Adapter And Conversion Wiring

Files:

- Modify `src/pdf2md/mineru_adapter.py`
- Modify `src/pdf2md/conversion.py`
- Modify `tests/test_mineru_adapter.py`
- Modify `tests/test_conversion.py`

Steps:

- [x] Add failing adapter tests for profile environment variables and environment restoration.
- [x] Add failing conversion tests that metadata receives applied profile information.
- [x] Extend `MinerUOptions` and conversion options minimally.
- [x] Merge GPU and profile environment variables before the MinerU subprocess.
- [x] Run `uv run pytest tests/test_mineru_adapter.py tests/test_conversion.py tests/test_mineru_profile.py tests/test_gpu.py`.
- [x] Commit adapter/conversion wiring.

### Task 4: CLI And Doctor

Files:

- Modify `src/pdf2md/cli.py`
- Modify `src/pdf2md/doctor.py`
- Modify `tests/test_cli.py`
- Modify `tests/test_doctor.py`

Steps:

- [x] Add failing CLI tests for default `auto`, explicit `safe`, explicit `performance`, invalid profile rejection, and `--gpu auto`.
- [x] Add failing doctor tests for GPU inventory and recommended profile details.
- [x] Implement CLI argument parsing and doctor report additions.
- [x] Run `uv run pytest tests/test_cli.py tests/test_doctor.py tests/test_gpu.py tests/test_mineru_profile.py`.
- [x] Commit CLI and doctor wiring.

### Task 5: UI And Documentation

Files:

- Modify `src/pdf2md_ui/runner.py` only if explicit UI profile passthrough is needed
- Modify `src/pdf2md_ui/app.py` only if explicit UI profile control is needed
- Modify `tests/test_ui_runner.py` only if runner command construction changes
- Modify `README.md`
- Modify `ARCHITECTURE.md`
- Modify `PRD.md`
- Modify `docs/V1IMPLEMENTATIONPLAN.md`
- Modify `PLAN.md`
- Modify `PROGRESS.md`
- Modify `docs/WORKARCHIVE.md` after implementation

Steps:

- [x] Keep UI unchanged if default CLI `auto` profile is enough for the first implementation pass.
- [x] If UI exposes a profile control, add tests for fixed argument-list construction with `shell=False`.
- [x] Document `--mineru-profile`, `--gpu auto`, profile policy, strict-local boundaries, and stronger-PC validation command.
- [x] Run focused docs/UI tests if changed.
- [x] Run final verification commands.
- [x] Commit documentation and final coordination updates.

## Verification Commands

```powershell
uv run pytest tests/test_gpu.py tests/test_mineru_profile.py tests/test_mineru_adapter.py tests/test_conversion.py tests/test_cli.py tests/test_doctor.py
uv run pytest
git diff --check
git status --short --untracked-files=all
```

Optional stronger-PC validation is listed in the Tests section and must remain explicit opt-in.

## Handoff Requirements

After implementation:

- Update `PROGRESS.md` with files changed, commands run, test outcomes, optional stronger-PC validation outcome, known failures, residual risks, and next action.
- Archive completed implementation details in `docs/WORKARCHIVE.md`.
- Keep generated outputs, sample PDFs, local model files, and UI build artifacts out of the commit.
- Record the detected GPU, applied profile, and whether `samples\FourNodeQuadrilateralShellElementMITC4.pdf` completed on the stronger PC.

Implementation handoff:

- Files changed: `src/pdf2md/gpu.py`, `src/pdf2md/mineru_profile.py`, `src/pdf2md/mineru_adapter.py`, `src/pdf2md/conversion.py`, `src/pdf2md/cli.py`, `src/pdf2md/doctor.py`, docs, and focused tests.
- Commands run: `uv run pytest tests/test_gpu.py tests/test_mineru_profile.py tests/test_mineru_adapter.py tests/test_conversion.py tests/test_cli.py tests/test_doctor.py`; `uv run pytest`; `uv run pdf2md doctor`.
- Tests passed: targeted Sprint 15 suite passed 101 tests; full default suite passed 225 tests with 1 optional skip; local doctor returned WARN with expected GTX 1070 Ti safe-profile recommendation.
- Known failures: optional stronger-PC real MinerU conversion validation was not run in this workspace.
- Residual risks: GTX 1070 Ti 8GB remains likely to stall on hard pages; stronger-PC behavior still needs local runtime validation.
- Next action: on a stronger NVIDIA GPU PC, run `pdf2md doctor` and an explicit local conversion with `--gpu auto --mineru-profile auto`.

## Future Sprint Boundary

A later sprint may add page-level timeout handling, resumable page caches, or a performance mode that can run multiple page conversions concurrently on GPUs with enough VRAM. Those behaviors are intentionally out of Sprint 15 scope.