8.0 KiB
Sprint 12 Contract: Minimal Windows UI Launcher
Status: Implemented with residual conversion-smoke risk Last updated: 2026-05-11
Objective
Build a minimal Windows desktop launcher for the existing pdf2md CLI and package the launcher itself as dist/pdf2md-ui.exe.
The UI must remain a thin local launcher. It must not become a second conversion engine, a hosted app, a manual review workflow, or a bundled redistribution of MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
Research Basis
- Primary research document:
docs/UI_RESEARCH.md. - The recommended implementation path is
tkinter/ttk, a subprocess runner aroundpdf2mdoruv run pdf2md, and PyInstaller for the Windows executable.
Current Precondition
pdf2md doctor,pdf2md convert, andpdf2md recheckare implemented.- Conversion remains strict-local and MinerU-only.
- Current CLI output is coarse during MinerU execution because the adapter captures MinerU subprocess output internally.
- UI research is complete.
- UI implementation exists under
src/pdf2md_ui/. dist\pdf2md-ui.execan be built with PyInstaller.
Touched Surfaces
Allowed during implementation:
src/pdf2md_ui/__init__.pysrc/pdf2md_ui/app.pysrc/pdf2md_ui/runner.pytests/test_ui_runner.pypyproject.tomluv.lockREADME.mdPLAN.mdPROGRESS.mddocs/WORKARCHIVE.mddocs/V1IMPLEMENTATIONPLAN.md
Generated but not committed unless explicitly requested:
build/dist/*.spec- generated conversion outputs under
outputs/
Not allowed:
- Runtime document upload paths.
- Remote OCR, hosted LLM/VLM, hosted renderers, or remote document parsing APIs.
--api-url, router mode, HTTP client backends, remote OpenAI-compatible endpoints, or runtime engine selection.- Direct UI calls to
mineru; the UI must call the project-ownedpdf2mdCLI. - Bundling MinerU, CUDA PyTorch, local model weights, Node.js, or MathJax into the first UI executable.
- Batch queues, drag/drop, PDF preview, Markdown preview, Obsidian automation, installer generation, or code signing in this sprint.
- Mandatory default tests that require real MinerU, GPU, model files, network, Obsidian, or
samples/.
Product Behavior
The first UI is a single-window launcher:
- Select one input PDF.
- Select an output root, defaulting to
outputs; the current CLI creates the final<stem>\folder inside it. - Configure only existing CLI options:
- overwrite
- keep raw output
- optional grouped pages with default
20 - GPU device with default
cuda:0, includingautowhen supported by the CLI - MinerU profile
auto|safe|performancewith defaultauto
- Run
Doctor. - Run
Convert. - Run
Recheckfor an existing Markdown output. - Cancel a running subprocess.
- Open the output directory after completion.
- Show a read-only log and indeterminate progress while a command is running.
Command resolution:
- Use a configured command if present.
- Else use
pdf2mdfromPATH. - Else use
uv run pdf2mdfrom a configured project root containingpyproject.toml. - Else report a setup error and direct the user to run
pdf2md doctor.
Architecture Plan
WP12.1: CLI Runner
Actions:
- Add a runner module that builds fixed argument lists for
doctor,convert, andrecheck. - Use
subprocess.Popenwithshell=False. - Set
MINERU_MODEL_SOURCE=localin the child environment unless already set. - Merge stderr into stdout for a single UI log stream.
- Read subprocess output on a worker thread and report status events to the UI.
- Add a Windows process-tree cancellation helper that uses
taskkill /pid <pid> /t /fonly after normal termination does not finish promptly.
Expected output:
- Testable command-construction and process-management code that never accepts arbitrary shell text from the UI.
WP12.2: Minimal Tk UI
Actions:
- Add a
tkinter/ttkapp with file and directory pickers, option controls, command buttons, progress indicator, and log pane. - Keep long-running work off Tk's event handler thread.
- Disable conflicting controls while a command is running.
- Surface non-zero exit codes clearly.
Expected output:
- A simple local GUI for existing CLI workflows.
WP12.3: Build
Actions:
- Add PyInstaller only to a build dependency group such as
ui-build. - Build the executable with:
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Expected output:
dist\pdf2md-ui.exeexists after the build.
Verification Checks
Default checks:
uv run pytest tests/test_ui_runner.pyuv run pytest tests/test_cli.pyif shared CLI behavior changesgit diff --checkgit status --short --untracked-files=all
Build check:
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Test-Path dist\pdf2md-ui.exe
Manual smoke:
- Launch
dist\pdf2md-ui.exe. - Run Doctor from the UI.
- Convert one small local sample into an ignored
outputs/directory. - Confirm Markdown, report Markdown, and assets are produced as expected for the active output layout.
Acceptance Criteria
- The UI invokes
pdf2mdoruv run pdf2md; it never invokesminerudirectly. - Commands are fixed argument lists and run with
shell=False. - The UI remains responsive while a conversion is running.
- Cancel attempts to stop the process tree on Windows.
- Doctor and conversion exit codes are visible in the UI.
- PyInstaller produces
dist\pdf2md-ui.exe. - Default tests stay independent of real MinerU, GPU, model files, network, Obsidian, and
samples/.
Hard Failure Criteria
- UI code exposes arbitrary shell command execution.
- UI exposes remote/API options or weakens strict-local policy.
- UI claims conversion success without checking the CLI exit code.
- UI freezes during a long conversion because the CLI runs on Tk's event handler thread.
- The first UI executable bundles MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
- Build outputs, generated conversion outputs, local models, or sample PDFs are committed.
Handoff Requirements
After implementation:
- Update
PROGRESS.mdwith files changed, commands run, test outcomes, build outcome, known failures, residual risks, and next action. - Move completed implementation details to
docs/WORKARCHIVE.mdafter verification. - Keep sample PDFs and generated outputs out of the commit.
Implementation Handoff
Files changed:
src/pdf2md_ui/__init__.pysrc/pdf2md_ui/app.pysrc/pdf2md_ui/runner.pytests/test_ui_runner.pypyproject.tomluv.lockREADME.mdPLAN.mdPROGRESS.mddocs/WORKARCHIVE.mddocs/V1IMPLEMENTATIONPLAN.md
Verification:
uv run pytest tests\test_ui_runner.py: passed 16 tests.uv run pytest: passed 188 tests with 1 optional skip.uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py: passed.Test-Path dist\pdf2md-ui.exe: returnedTrue.uv run pdf2md doctor: returned WARN only for the documented GTX 1070 Ti/Pascal compatibility risk.- Launch smoke for
dist\pdf2md-ui.exe: process started and was then terminated by the smoke script.
Follow-up refresh on 2026-05-12:
- Updated the UI command builder and form controls for the Sprint 15
--mineru-profile auto|safe|performanceCLI option. - Rebuilt
dist\pdf2md-ui.exeafter Sprint 16 simplified output layout and Sprint 15 profile changes. uv run pytest tests\test_ui_runner.py: passed 17 tests.- Launch smoke for the rebuilt
dist\pdf2md-ui.exe: process started and was then terminated by the smoke script.
Known failure:
- A CLI conversion smoke using
samples\FourNodeQuadrilateralShellElementMITC4.pdfand the same command shape used by the UI did not finish within the 15-minute timeout. The spawned process tree was terminated withtaskkill.
Residual risk:
- A hands-on UI Doctor click and UI conversion click should still be run when the local MinerU runtime is expected to complete within an acceptable time.