baram2584/PDFToMD

Fork 0

Files

T

김경종 dc11880140 modify pdftomd

2026-05-14 10:16:59 +09:00

20 KiB

Raw Blame History

Sprint 15 Contract: NVIDIA GPU Detection And Auto MinerU Profile

Status: Implemented Last updated: 2026-05-12

Objective

Add a strict-local runtime profiling layer that detects installed NVIDIA GPUs and applies conservative MinerU environment tuning by default.

The default runtime profile is auto. In auto, the converter should keep 8GB and pre-Turing GPUs conservative, while allowing a slightly more aggressive local MinerU configuration only when the selected NVIDIA GPU has at least 16GB VRAM and no pre-Turing compatibility warning.

This sprint is motivated by local evidence from samples\FourNodeQuadrilateralShellElementMITC4.pdf: Sprint 14's one-page conversion path used cuda:0 correctly, but GTX 1070 Ti 8GB stayed near full VRAM use and stalled on source page 2. The next useful test should be on a stronger NVIDIA GPU with explicit runtime diagnostics and reproducible MinerU environment settings.

Source Basis

Use these source-backed facts during implementation:

MinerU CLI supports mineru -p <input_path> -o <output_path> and, without --api-url, launches a temporary local mineru-api: https://opendatalab.github.io/MinerU/usage/cli_tools/
MinerU CLI documents -b/--backend, -f/--formula, -t/--table, --api-url, and related options, but this project must not expose remote/API or backend selection paths in v1: https://opendatalab.github.io/MinerU/usage/cli_tools/
MinerU environment variables include MINERU_PDF_RENDER_THREADS, MINERU_PROCESSING_WINDOW_SIZE, MINERU_API_MAX_CONCURRENT_REQUESTS, and timeout settings: https://opendatalab.github.io/MinerU/usage/cli_tools/
MinerU advanced CLI docs support selecting visible GPU devices with CUDA_VISIBLE_DEVICES: https://opendatalab.github.io/MinerU/usage/advanced_cli_parameters/
MinerU local deployment docs list auto-engine GPU requirements around 8GB+ VRAM and GPU acceleration for Volta-or-later devices: https://opendatalab.github.io/MinerU/quick_start/
MinerU extension docs say vllm and lmdeploy acceleration extras are alternatives and should not both be installed just for this sprint: https://opendatalab.github.io/MinerU/quick_start/extension_modules/

Access date for the source review: 2026-05-12.

Current Precondition

MinerU 3.1.0 remains the only conversion engine.
Conversion runs through direct local mineru CLI execution only.
Strict-local allows only the direct CLI and MinerU CLI-internal temporary local mineru-api; remote API/backend paths remain prohibited.
pdf2md convert defaults to --gpu cuda:0.
MinerUAdapter currently maps cuda:N to MINERU_DEVICE_MODE=cuda and CUDA_VISIBLE_DEVICES=N.
pdf2md doctor already reports NVIDIA GPU visibility, PyTorch CUDA visibility, GPU names, and Pascal/pre-Turing warnings.
Sprint 14 chunk mode runs one source page per MinerU invocation when --chunk-pages is active.

Contract Assumptions

Keep --gpu cuda:0 as the default for backward compatibility with PRD and existing docs.
Add --gpu auto as an opt-in GPU selection mode that chooses the visible NVIDIA GPU with the largest reported VRAM.
Add --mineru-profile {auto,safe,performance} with default auto.
Keep all conversion requests sequential in Sprint 15. Do not introduce parallel page conversion.
Keep formula and table parsing enabled. Do not optimize by disabling required output quality features.
Do not add --backend, --api-url, --url, router mode, HTTP client backend, remote OpenAI-compatible backend, or remote model server support.
Treat MinerU environment tuning as best-effort. If GPU inventory cannot be read, continue with safe profile settings and a warning/provenance record rather than guessing aggressive values.

Touched Surfaces

Allowed during implementation:

Create src/pdf2md/gpu.py
Create src/pdf2md/mineru_profile.py
Modify src/pdf2md/mineru_adapter.py
Modify src/pdf2md/conversion.py
Modify src/pdf2md/cli.py
Modify src/pdf2md/doctor.py
Modify src/pdf2md_ui/runner.py only if the UI command builder needs profile passthrough
Modify src/pdf2md_ui/app.py only if a minimal profile control is necessary
Add tests/test_gpu.py
Add tests/test_mineru_profile.py
Modify tests/test_mineru_adapter.py
Modify tests/test_conversion.py
Modify tests/test_cli.py
Modify tests/test_doctor.py
Modify tests/test_ui_runner.py only if UI command construction changes
Modify README.md
Modify ARCHITECTURE.md
Modify PRD.md if CLI option documentation changes
Modify docs/V1IMPLEMENTATIONPLAN.md
Modify PLAN.md
Modify PROGRESS.md
Modify docs/WORKARCHIVE.md after implementation

Not allowed:

Adding another conversion engine or runtime engine selector.
Passing --api-url, --url, or any remote endpoint to MinerU.
Adding mineru-router, HTTP client backend, or OpenAI-compatible backend usage.
Installing vllm, lmdeploy, CUDA packages, models, or any runtime package automatically.
Changing the default conversion engine or disabling formula/table recognition.
Making default tests depend on real MinerU, GPU, CUDA, PyTorch, model files, network, Obsidian, MathJax, or samples/.
Committing sample PDFs, generated outputs/, retained temporary page outputs, local model files, or dist/pdf2md-ui.exe.

Product Behavior

CLI

Existing behavior remains valid:

uv run pdf2md convert paper.pdf --out outputs
uv run pdf2md convert paper.pdf --out outputs --gpu cuda:0

New behavior:

uv run pdf2md convert paper.pdf --out outputs --mineru-profile auto
uv run pdf2md convert paper.pdf --out outputs --mineru-profile safe
uv run pdf2md convert paper.pdf --out outputs --mineru-profile performance
uv run pdf2md convert paper.pdf --out outputs --gpu auto --mineru-profile auto

Rules:

--mineru-profile defaults to auto.
--gpu cuda:N selects a concrete CUDA index and tunes MinerU for that selected GPU when inventory is available.
--gpu N is still normalized to cuda:N.
--gpu auto selects the visible NVIDIA GPU with the largest VRAM from local GPU inventory.
If --gpu auto cannot find a visible NVIDIA GPU, fail clearly before conversion rather than silently switching to CPU.
If --mineru-profile performance is requested on a selected GPU below 16GB VRAM or with pre-Turing risk, downgrade to safe settings with a warning in metadata/report. Do not fail solely because performance was unsafe.

Doctor

pdf2md doctor should report:

All visible NVIDIA GPUs with index, name, total VRAM, and driver version from nvidia-smi.
PyTorch CUDA device names and compute capabilities when available.
Selected default GPU recommendation for --gpu auto.
Recommended MinerU profile for the detected primary GPU.
Existing Pascal/pre-Turing warnings.

Doctor must not require a real conversion, model load, network access, or package download.

Auto Profile Policy

Use a small deterministic policy table. Values are intentionally conservative because the converter runs real PDFs and should prefer completion over peak throughput.

Selected GPU	Auto policy	MinerU environment
No GPU inventory, CUDA requested	Safe fallback with warning	`MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1`
Pre-Turing or VRAM < 12GB	Safe	`MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1`
12GB <= VRAM < 16GB	Auto conservative	`MINERU_PROCESSING_WINDOW_SIZE=4`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=2`
VRAM >= 16GB and Turing-or-newer	Auto moderately aggressive	`MINERU_PROCESSING_WINDOW_SIZE=8`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=4`
Explicit `safe`	Safe regardless of GPU	`MINERU_PROCESSING_WINDOW_SIZE=1`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=1`
Explicit `performance` on VRAM >= 16GB and Turing-or-newer	Performance	`MINERU_PROCESSING_WINDOW_SIZE=16`, `MINERU_API_MAX_CONCURRENT_REQUESTS=1`, `MINERU_PDF_RENDER_THREADS=4`
Explicit `performance` on weaker GPU	Downgraded safe with warning	safe values

Do not set MINERU_HYBRID_BATCH_RATIO in Sprint 15 because MinerU docs describe it as commonly used for hybrid-http-client, which this project prohibits in v1.

Do not set backend CLI flags in Sprint 15. The default MinerU backend remains MinerU-owned.

Architecture Plan

WP15.1: GPU Inventory Boundary

Actions:

Add src/pdf2md/gpu.py.
Define immutable GpuInfo and GpuInventory records.
Parse nvidia-smi --query-gpu=index,name,memory.total,driver_version --format=csv,noheader,nounits.
Parse memory in MiB as an integer.
Mark pre-Turing risk using the existing name-based heuristic for GTX 10xx and pre-Turing names.
Optionally enrich compute capability through PyTorch when available, but keep PyTorch optional and mockable.
Provide select_gpu(gpus, requested) for cuda:N, N, and auto.

Expected output:

GPU detection is independently testable with captured command output strings.
No real nvidia-smi, GPU, or PyTorch is needed in default tests.

WP15.2: MinerU Profile Policy

Actions:

Add src/pdf2md/mineru_profile.py.
Define supported profile names: auto, safe, performance.
Define a result record containing:
- requested profile,
- applied profile,
- selected GPU index if known,
- selected GPU name if known,
- selected GPU VRAM MiB if known,
- environment variables to set,
- warnings or info messages as project WarningRecord values.
Implement the policy table above.
Keep profile environment values in a small allowlist.

Expected output:

The policy can be tested without running MinerU.
Performance profile cannot silently overcommit weak GPUs.

WP15.3: Adapter Environment Integration

Actions:

Extend MinerUOptions with mineru_profile: str = "auto" and optional resolved profile metadata.
Keep strict-local validation for every option string.
Update _mineru_environment() to merge:
- MINERU_DEVICE_MODE=cuda,
- CUDA_VISIBLE_DEVICES=<selected index>,
- profile environment variables from mineru_profile.py.
Preserve previous environment values after subprocess execution.
Include profile details in engine_options.

Expected output:

Real MinerU still receives only direct local CLI command shape:

mineru -p <input> -o <output>

Tuning is done through local environment variables, not remote/API/backend flags.

WP15.4: Conversion And CLI Wiring

Actions:

Add --mineru-profile to pdf2md convert.
Accept --gpu auto.
Resolve selected GPU and profile before calling the adapter.
Surface profile warnings in conversion metadata/report warnings.
Preserve existing --gpu cuda:0 default.
Ensure convert_pdf() can receive the profile through the Python API.

Expected output:

Default conversions use mineru_profile=auto.
Existing calls with no new flags continue to work.
Metadata explains which profile was applied.

WP15.5: Doctor Reporting

Actions:

Reuse gpu.py inventory parsing in doctor.py.
Keep the existing gpu and pytorch checks, but make GPU details more explicit.
Add a doctor detail line for auto-selected GPU and recommended profile.
Keep warning-only behavior for Pascal/pre-Turing GPUs.

Expected output:

On a stronger PC, pdf2md doctor shows enough evidence to decide whether auto or performance is appropriate.
On the current GTX 1070 Ti, doctor still warns and recommends safe/conservative behavior.

WP15.6: Documentation

Actions:

Update README setup and conversion docs with --mineru-profile.
Update ARCHITECTURE to document that tuning uses strict-local environment variables only.
Update PRD CLI section if the new public flag is added.
Update docs/V1IMPLEMENTATIONPLAN.md, PLAN.md, and PROGRESS.md.
Archive implementation details in docs/WORKARCHIVE.md only after implementation and verification.

Expected output:

Users can move the repo to a stronger NVIDIA GPU PC, run pdf2md doctor, and understand the selected profile.

Tests

Default fast tests:

GPU inventory parser handles one RTX GPU, multiple GPUs, no GPU lines, and malformed memory fields.
select_gpu(..., "auto") selects the largest VRAM GPU.
select_gpu(..., "cuda:1") selects index 1 and errors when absent.
select_gpu(..., "1") normalizes to index 1.
auto profile returns safe values for GTX 1070 Ti 8GB.
auto profile returns moderately aggressive values for an RTX GPU with 16GB or more.
performance profile returns performance values only for 16GB+ Turing-or-newer GPUs.
performance profile on GTX 1070 Ti downgrades to safe and returns a warning.
Adapter sets and restores MINERU_DEVICE_MODE, CUDA_VISIBLE_DEVICES, MINERU_PROCESSING_WINDOW_SIZE, MINERU_API_MAX_CONCURRENT_REQUESTS, and MINERU_PDF_RENDER_THREADS.
Strict-local validation rejects remote/API/backend-like option strings in profile-related fields.
CLI default passes mineru_profile=auto.
CLI accepts --mineru-profile safe and --mineru-profile performance.
CLI rejects invalid profile values.
Doctor report includes visible GPU details and recommended profile with mocked command outputs.
Existing conversion, chunking, metadata, report, and UI tests remain green.

Optional local validation on a stronger NVIDIA GPU PC:

uv run pdf2md doctor
$env:MINERU_MODEL_SOURCE='local'
uv run pdf2md convert samples\FourNodeQuadrilateralShellElementMITC4.pdf --out outputs\fournode-sprint15-auto --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local

Expected optional validation:

Doctor reports the stronger GPU name, VRAM, and recommended profile.
Conversion metadata records mineru_profile and selected GPU information.
Generated outputs stay ignored and uncommitted.

Acceptance Criteria

--mineru-profile auto is the default conversion behavior.
auto uses safe settings on the current GTX 1070 Ti 8GB and stronger settings only on 16GB+ Turing-or-newer NVIDIA GPUs.
--gpu auto can choose the largest visible NVIDIA GPU without adding remote/runtime backend support.
MinerU command shape remains direct local CLI only.
Strict-local prohibitions remain enforced.
pdf2md doctor provides actionable GPU/profile information.
Metadata/report preserve the applied runtime profile.
Default tests remain fast, mocked, local, and independent of real MinerU/GPU/model files/network/samples.

Hard Failure Criteria

Implementation adds runtime backend selection or exposes --backend.
Implementation passes --api-url, --url, router, HTTP client backend, or remote OpenAI-compatible backend values.
auto profile applies aggressive settings to GTX 1070 Ti 8GB or other pre-Turing/low-VRAM GPUs.
Existing --gpu cuda:0 behavior breaks.
Profile tuning disables formula or table parsing.
Doctor or tests require real GPU, real MinerU execution, model files, network, Obsidian, MathJax, or samples/.
Sample PDFs, generated outputs, local model files, or dist/pdf2md-ui.exe are committed.

Implementation Task Plan

Task 1: GPU Inventory

Files:

Create src/pdf2md/gpu.py
Create tests/test_gpu.py

Steps:

Add failing tests for parsing nvidia-smi CSV output.
Add failing tests for auto, cuda:N, and numeric GPU selection.
Implement immutable GPU records and parser helpers.
Implement selection errors as ValueError with clear messages.
Run uv run pytest tests/test_gpu.py.
Commit GPU inventory boundary.

Task 2: MinerU Profile Policy

Files:

Create src/pdf2md/mineru_profile.py
Create tests/test_mineru_profile.py

Steps:

Add failing tests for safe, auto, and performance profile policy.
Add tests proving 16GB+ Turing-or-newer GPUs get the moderately aggressive auto environment.
Add tests proving GTX 1070 Ti 8GB stays safe.
Implement the allowlisted environment mapping.
Run uv run pytest tests/test_mineru_profile.py tests/test_gpu.py.
Commit profile policy.

Task 3: Adapter And Conversion Wiring

Files:

Modify src/pdf2md/mineru_adapter.py
Modify src/pdf2md/conversion.py
Modify tests/test_mineru_adapter.py
Modify tests/test_conversion.py

Steps:

Add failing adapter tests for profile environment variables and environment restoration.
Add failing conversion tests that metadata receives applied profile information.
Extend MinerUOptions and conversion options minimally.
Merge GPU and profile environment variables before the MinerU subprocess.
Run uv run pytest tests/test_mineru_adapter.py tests/test_conversion.py tests/test_mineru_profile.py tests/test_gpu.py.
Commit adapter/conversion wiring.

Task 4: CLI And Doctor

Files:

Modify src/pdf2md/cli.py
Modify src/pdf2md/doctor.py
Modify tests/test_cli.py
Modify tests/test_doctor.py

Steps:

Add failing CLI tests for default auto, explicit safe, explicit performance, invalid profile rejection, and --gpu auto.
Add failing doctor tests for GPU inventory and recommended profile details.
Implement CLI argument parsing and doctor report additions.
Run uv run pytest tests/test_cli.py tests/test_doctor.py tests/test_gpu.py tests/test_mineru_profile.py.
Commit CLI and doctor wiring.

Task 5: UI And Documentation

Files:

Modify src/pdf2md_ui/runner.py only if explicit UI profile passthrough is needed
Modify src/pdf2md_ui/app.py only if explicit UI profile control is needed
Modify tests/test_ui_runner.py only if runner command construction changes
Modify README.md
Modify ARCHITECTURE.md
Modify PRD.md
Modify docs/V1IMPLEMENTATIONPLAN.md
Modify PLAN.md
Modify PROGRESS.md
Modify docs/WORKARCHIVE.md after implementation

Steps:

Keep UI unchanged if default CLI auto profile is enough for the first implementation pass.
If UI exposes a profile control, add tests for fixed argument-list construction with shell=False.
Document --mineru-profile, --gpu auto, profile policy, strict-local boundaries, and stronger-PC validation command.
Run focused docs/UI tests if changed.
Run final verification commands.
Commit documentation and final coordination updates.

Verification Commands

uv run pytest tests/test_gpu.py tests/test_mineru_profile.py tests/test_mineru_adapter.py tests/test_conversion.py tests/test_cli.py tests/test_doctor.py
uv run pytest
git diff --check
git status --short --untracked-files=all

Optional stronger-PC validation is listed in the Tests section and must remain explicit opt-in.

Handoff Requirements

After implementation:

Update PROGRESS.md with files changed, commands run, test outcomes, optional stronger-PC validation outcome, known failures, residual risks, and next action.
Archive completed implementation details in docs/WORKARCHIVE.md.
Keep generated outputs, sample PDFs, local model files, and UI build artifacts out of the commit.
Record the detected GPU, applied profile, and whether samples\FourNodeQuadrilateralShellElementMITC4.pdf completed on the stronger PC.

Implementation handoff:

Files changed: src/pdf2md/gpu.py, src/pdf2md/mineru_profile.py, src/pdf2md/mineru_adapter.py, src/pdf2md/conversion.py, src/pdf2md/cli.py, src/pdf2md/doctor.py, docs, and focused tests.
Commands run: uv run pytest tests/test_gpu.py tests/test_mineru_profile.py tests/test_mineru_adapter.py tests/test_conversion.py tests/test_cli.py tests/test_doctor.py; uv run pytest; uv run pdf2md doctor.
Tests passed: targeted Sprint 15 suite passed 101 tests; full default suite passed 225 tests with 1 optional skip; local doctor returned WARN with expected GTX 1070 Ti safe-profile recommendation.
Known failures: optional stronger-PC real MinerU conversion validation was not run in this workspace.
Residual risks: GTX 1070 Ti 8GB remains likely to stall on hard pages; stronger-PC behavior still needs local runtime validation.
Next action: on a stronger NVIDIA GPU PC, run pdf2md doctor and an explicit local conversion with --gpu auto --mineru-profile auto.

Future Sprint Boundary

A later sprint may add page-level timeout handling, resumable page caches, or a performance mode that can run multiple page conversions concurrently on GPUs with enough VRAM. Those behaviors are intentionally out of Sprint 15 scope.

20 KiB Raw Blame History

Sprint 15 Contract: NVIDIA GPU Detection And Auto MinerU Profile

Objective

Source Basis

Current Precondition

Contract Assumptions

Touched Surfaces

Product Behavior

CLI

Doctor

Auto Profile Policy

Architecture Plan

WP15.1: GPU Inventory Boundary

WP15.2: MinerU Profile Policy

WP15.3: Adapter Environment Integration

WP15.4: Conversion And CLI Wiring

WP15.5: Doctor Reporting

WP15.6: Documentation

Tests

Acceptance Criteria

Hard Failure Criteria

Implementation Task Plan

Task 1: GPU Inventory

Task 2: MinerU Profile Policy

Task 3: Adapter And Conversion Wiring

Task 4: CLI And Doctor

Task 5: UI And Documentation

Verification Commands

Handoff Requirements

Future Sprint Boundary

20 KiB

Raw Blame History