# Sprint 12 Contract: Minimal Windows UI Launcher Status: Implemented with residual conversion-smoke risk Last updated: 2026-05-11 ## Objective Build a minimal Windows desktop launcher for the existing `pdf2md` CLI and package the launcher itself as `dist/pdf2md-ui.exe`. The UI must remain a thin local launcher. It must not become a second conversion engine, a hosted app, a manual review workflow, or a bundled redistribution of MinerU, CUDA PyTorch, model weights, Node.js, or MathJax. ## Research Basis - Primary research document: `docs/UI_RESEARCH.md`. - The recommended implementation path is `tkinter`/`ttk`, a subprocess runner around `pdf2md` or `uv run pdf2md`, and PyInstaller for the Windows executable. ## Current Precondition - `pdf2md doctor`, `pdf2md convert`, and `pdf2md recheck` are implemented. - Conversion remains strict-local and MinerU-only. - Current CLI output is coarse during MinerU execution because the adapter captures MinerU subprocess output internally. - UI research is complete. - UI implementation exists under `src/pdf2md_ui/`. - `dist\pdf2md-ui.exe` can be built with PyInstaller. ## Touched Surfaces Allowed during implementation: - `src/pdf2md_ui/__init__.py` - `src/pdf2md_ui/app.py` - `src/pdf2md_ui/runner.py` - `tests/test_ui_runner.py` - `pyproject.toml` - `uv.lock` - `README.md` - `PLAN.md` - `PROGRESS.md` - `docs/WORKARCHIVE.md` - `docs/V1IMPLEMENTATIONPLAN.md` Generated but not committed unless explicitly requested: - `build/` - `dist/` - `*.spec` - generated conversion outputs under `outputs/` Not allowed: - Runtime document upload paths. - Remote OCR, hosted LLM/VLM, hosted renderers, or remote document parsing APIs. - `--api-url`, router mode, HTTP client backends, remote OpenAI-compatible endpoints, or runtime engine selection. - Direct UI calls to `mineru`; the UI must call the project-owned `pdf2md` CLI. - Bundling MinerU, CUDA PyTorch, local model weights, Node.js, or MathJax into the first UI executable. - Batch queues, drag/drop, PDF preview, Markdown preview, Obsidian automation, installer generation, or code signing in this sprint. - Mandatory default tests that require real MinerU, GPU, model files, network, Obsidian, or `samples/`. ## Product Behavior The first UI is a single-window launcher: - Select one input PDF. - Select an output root, defaulting to `outputs`; the current CLI creates the final `\` folder inside it. - Configure only existing CLI options: - overwrite - keep raw output - optional grouped pages with default `20` - GPU device with default `cuda:0`, including `auto` when supported by the CLI - MinerU profile `auto|safe|performance` with default `auto` - Run `Doctor`. - Run `Convert`. - Run `Recheck` for an existing Markdown output. - Cancel a running subprocess. - Open the output directory after completion. - Show a read-only log and indeterminate progress while a command is running. Command resolution: 1. Use a configured command if present. 2. Else use `pdf2md` from `PATH`. 3. Else use `uv run pdf2md` from a configured project root containing `pyproject.toml`. 4. Else report a setup error and direct the user to run `pdf2md doctor`. ## Architecture Plan ### WP12.1: CLI Runner Actions: - Add a runner module that builds fixed argument lists for `doctor`, `convert`, and `recheck`. - Use `subprocess.Popen` with `shell=False`. - Set `MINERU_MODEL_SOURCE=local` in the child environment unless already set. - Merge stderr into stdout for a single UI log stream. - Read subprocess output on a worker thread and report status events to the UI. - Add a Windows process-tree cancellation helper that uses `taskkill /pid /t /f` only after normal termination does not finish promptly. Expected output: - Testable command-construction and process-management code that never accepts arbitrary shell text from the UI. ### WP12.2: Minimal Tk UI Actions: - Add a `tkinter`/`ttk` app with file and directory pickers, option controls, command buttons, progress indicator, and log pane. - Keep long-running work off Tk's event handler thread. - Disable conflicting controls while a command is running. - Surface non-zero exit codes clearly. Expected output: - A simple local GUI for existing CLI workflows. ### WP12.3: Build Actions: - Add PyInstaller only to a build dependency group such as `ui-build`. - Build the executable with: ```powershell uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py ``` Expected output: - `dist\pdf2md-ui.exe` exists after the build. ## Verification Checks Default checks: - `uv run pytest tests/test_ui_runner.py` - `uv run pytest tests/test_cli.py` if shared CLI behavior changes - `git diff --check` - `git status --short --untracked-files=all` Build check: ```powershell uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py Test-Path dist\pdf2md-ui.exe ``` Manual smoke: 1. Launch `dist\pdf2md-ui.exe`. 2. Run Doctor from the UI. 3. Convert one small local sample into an ignored `outputs/` directory. 4. Confirm Markdown, report Markdown, and assets are produced as expected for the active output layout. ## Acceptance Criteria - The UI invokes `pdf2md` or `uv run pdf2md`; it never invokes `mineru` directly. - Commands are fixed argument lists and run with `shell=False`. - The UI remains responsive while a conversion is running. - Cancel attempts to stop the process tree on Windows. - Doctor and conversion exit codes are visible in the UI. - PyInstaller produces `dist\pdf2md-ui.exe`. - Default tests stay independent of real MinerU, GPU, model files, network, Obsidian, and `samples/`. ## Hard Failure Criteria - UI code exposes arbitrary shell command execution. - UI exposes remote/API options or weakens strict-local policy. - UI claims conversion success without checking the CLI exit code. - UI freezes during a long conversion because the CLI runs on Tk's event handler thread. - The first UI executable bundles MinerU, CUDA PyTorch, model weights, Node.js, or MathJax. - Build outputs, generated conversion outputs, local models, or sample PDFs are committed. ## Handoff Requirements After implementation: - Update `PROGRESS.md` with files changed, commands run, test outcomes, build outcome, known failures, residual risks, and next action. - Move completed implementation details to `docs/WORKARCHIVE.md` after verification. - Keep sample PDFs and generated outputs out of the commit. ## Implementation Handoff Files changed: - `src/pdf2md_ui/__init__.py` - `src/pdf2md_ui/app.py` - `src/pdf2md_ui/runner.py` - `tests/test_ui_runner.py` - `pyproject.toml` - `uv.lock` - `README.md` - `PLAN.md` - `PROGRESS.md` - `docs/WORKARCHIVE.md` - `docs/V1IMPLEMENTATIONPLAN.md` Verification: - `uv run pytest tests\test_ui_runner.py`: passed 16 tests. - `uv run pytest`: passed 188 tests with 1 optional skip. - `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py`: passed. - `Test-Path dist\pdf2md-ui.exe`: returned `True`. - `uv run pdf2md doctor`: returned WARN only for the documented GTX 1070 Ti/Pascal compatibility risk. - Launch smoke for `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script. Follow-up refresh on 2026-05-12: - Updated the UI command builder and form controls for the Sprint 15 `--mineru-profile auto|safe|performance` CLI option. - Rebuilt `dist\pdf2md-ui.exe` after Sprint 16 simplified output layout and Sprint 15 profile changes. - `uv run pytest tests\test_ui_runner.py`: passed 17 tests. - Launch smoke for the rebuilt `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script. Known failure: - A CLI conversion smoke using `samples\FourNodeQuadrilateralShellElementMITC4.pdf` and the same command shape used by the UI did not finish within the 15-minute timeout. The spawned process tree was terminated with `taskkill`. Residual risk: - A hands-on UI Doctor click and UI conversion click should still be run when the local MinerU runtime is expected to complete within an acceptable time.