Files
PDFToMD/docs/Sprints/SPRINT12CONTRACT.md
T
2026-05-14 10:16:59 +09:00

8.0 KiB

Sprint 12 Contract: Minimal Windows UI Launcher

Status: Implemented with residual conversion-smoke risk Last updated: 2026-05-11

Objective

Build a minimal Windows desktop launcher for the existing pdf2md CLI and package the launcher itself as dist/pdf2md-ui.exe.

The UI must remain a thin local launcher. It must not become a second conversion engine, a hosted app, a manual review workflow, or a bundled redistribution of MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.

Research Basis

  • Primary research document: docs/UI_RESEARCH.md.
  • The recommended implementation path is tkinter/ttk, a subprocess runner around pdf2md or uv run pdf2md, and PyInstaller for the Windows executable.

Current Precondition

  • pdf2md doctor, pdf2md convert, and pdf2md recheck are implemented.
  • Conversion remains strict-local and MinerU-only.
  • Current CLI output is coarse during MinerU execution because the adapter captures MinerU subprocess output internally.
  • UI research is complete.
  • UI implementation exists under src/pdf2md_ui/.
  • dist\pdf2md-ui.exe can be built with PyInstaller.

Touched Surfaces

Allowed during implementation:

  • src/pdf2md_ui/__init__.py
  • src/pdf2md_ui/app.py
  • src/pdf2md_ui/runner.py
  • tests/test_ui_runner.py
  • pyproject.toml
  • uv.lock
  • README.md
  • PLAN.md
  • PROGRESS.md
  • docs/WORKARCHIVE.md
  • docs/V1IMPLEMENTATIONPLAN.md

Generated but not committed unless explicitly requested:

  • build/
  • dist/
  • *.spec
  • generated conversion outputs under outputs/

Not allowed:

  • Runtime document upload paths.
  • Remote OCR, hosted LLM/VLM, hosted renderers, or remote document parsing APIs.
  • --api-url, router mode, HTTP client backends, remote OpenAI-compatible endpoints, or runtime engine selection.
  • Direct UI calls to mineru; the UI must call the project-owned pdf2md CLI.
  • Bundling MinerU, CUDA PyTorch, local model weights, Node.js, or MathJax into the first UI executable.
  • Batch queues, drag/drop, PDF preview, Markdown preview, Obsidian automation, installer generation, or code signing in this sprint.
  • Mandatory default tests that require real MinerU, GPU, model files, network, Obsidian, or samples/.

Product Behavior

The first UI is a single-window launcher:

  • Select one input PDF.
  • Select an output root, defaulting to outputs; the current CLI creates the final <stem>\ folder inside it.
  • Configure only existing CLI options:
    • overwrite
    • keep raw output
    • optional grouped pages with default 20
    • GPU device with default cuda:0, including auto when supported by the CLI
    • MinerU profile auto|safe|performance with default auto
  • Run Doctor.
  • Run Convert.
  • Run Recheck for an existing Markdown output.
  • Cancel a running subprocess.
  • Open the output directory after completion.
  • Show a read-only log and indeterminate progress while a command is running.

Command resolution:

  1. Use a configured command if present.
  2. Else use pdf2md from PATH.
  3. Else use uv run pdf2md from a configured project root containing pyproject.toml.
  4. Else report a setup error and direct the user to run pdf2md doctor.

Architecture Plan

WP12.1: CLI Runner

Actions:

  • Add a runner module that builds fixed argument lists for doctor, convert, and recheck.
  • Use subprocess.Popen with shell=False.
  • Set MINERU_MODEL_SOURCE=local in the child environment unless already set.
  • Merge stderr into stdout for a single UI log stream.
  • Read subprocess output on a worker thread and report status events to the UI.
  • Add a Windows process-tree cancellation helper that uses taskkill /pid <pid> /t /f only after normal termination does not finish promptly.

Expected output:

  • Testable command-construction and process-management code that never accepts arbitrary shell text from the UI.

WP12.2: Minimal Tk UI

Actions:

  • Add a tkinter/ttk app with file and directory pickers, option controls, command buttons, progress indicator, and log pane.
  • Keep long-running work off Tk's event handler thread.
  • Disable conflicting controls while a command is running.
  • Surface non-zero exit codes clearly.

Expected output:

  • A simple local GUI for existing CLI workflows.

WP12.3: Build

Actions:

  • Add PyInstaller only to a build dependency group such as ui-build.
  • Build the executable with:
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py

Expected output:

  • dist\pdf2md-ui.exe exists after the build.

Verification Checks

Default checks:

  • uv run pytest tests/test_ui_runner.py
  • uv run pytest tests/test_cli.py if shared CLI behavior changes
  • git diff --check
  • git status --short --untracked-files=all

Build check:

uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Test-Path dist\pdf2md-ui.exe

Manual smoke:

  1. Launch dist\pdf2md-ui.exe.
  2. Run Doctor from the UI.
  3. Convert one small local sample into an ignored outputs/ directory.
  4. Confirm Markdown, report Markdown, and assets are produced as expected for the active output layout.

Acceptance Criteria

  • The UI invokes pdf2md or uv run pdf2md; it never invokes mineru directly.
  • Commands are fixed argument lists and run with shell=False.
  • The UI remains responsive while a conversion is running.
  • Cancel attempts to stop the process tree on Windows.
  • Doctor and conversion exit codes are visible in the UI.
  • PyInstaller produces dist\pdf2md-ui.exe.
  • Default tests stay independent of real MinerU, GPU, model files, network, Obsidian, and samples/.

Hard Failure Criteria

  • UI code exposes arbitrary shell command execution.
  • UI exposes remote/API options or weakens strict-local policy.
  • UI claims conversion success without checking the CLI exit code.
  • UI freezes during a long conversion because the CLI runs on Tk's event handler thread.
  • The first UI executable bundles MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
  • Build outputs, generated conversion outputs, local models, or sample PDFs are committed.

Handoff Requirements

After implementation:

  • Update PROGRESS.md with files changed, commands run, test outcomes, build outcome, known failures, residual risks, and next action.
  • Move completed implementation details to docs/WORKARCHIVE.md after verification.
  • Keep sample PDFs and generated outputs out of the commit.

Implementation Handoff

Files changed:

  • src/pdf2md_ui/__init__.py
  • src/pdf2md_ui/app.py
  • src/pdf2md_ui/runner.py
  • tests/test_ui_runner.py
  • pyproject.toml
  • uv.lock
  • README.md
  • PLAN.md
  • PROGRESS.md
  • docs/WORKARCHIVE.md
  • docs/V1IMPLEMENTATIONPLAN.md

Verification:

  • uv run pytest tests\test_ui_runner.py: passed 16 tests.
  • uv run pytest: passed 188 tests with 1 optional skip.
  • uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py: passed.
  • Test-Path dist\pdf2md-ui.exe: returned True.
  • uv run pdf2md doctor: returned WARN only for the documented GTX 1070 Ti/Pascal compatibility risk.
  • Launch smoke for dist\pdf2md-ui.exe: process started and was then terminated by the smoke script.

Follow-up refresh on 2026-05-12:

  • Updated the UI command builder and form controls for the Sprint 15 --mineru-profile auto|safe|performance CLI option.
  • Rebuilt dist\pdf2md-ui.exe after Sprint 16 simplified output layout and Sprint 15 profile changes.
  • uv run pytest tests\test_ui_runner.py: passed 17 tests.
  • Launch smoke for the rebuilt dist\pdf2md-ui.exe: process started and was then terminated by the smoke script.

Known failure:

  • A CLI conversion smoke using samples\FourNodeQuadrilateralShellElementMITC4.pdf and the same command shape used by the UI did not finish within the 15-minute timeout. The spawned process tree was terminated with taskkill.

Residual risk:

  • A hands-on UI Doctor click and UI conversion click should still be run when the local MinerU runtime is expected to complete within an acceptable time.