Files
2026-05-14 10:16:59 +09:00

12 KiB

UI Research: Minimal Windows Launcher For pdf2md

Last updated: 2026-05-11

Scope

User request:

  • Build a minimal UI that uses the existing pdf2md CLI.
  • Build it into a Windows .exe.
  • Research the implementation path before coding.

This document is research and planning input only. It does not change runtime behavior.

Current Project Fit

The existing converter is already centered on a CLI:

uv run pdf2md doctor
uv run pdf2md convert INPUT --out OUTPUT --overwrite
uv run pdf2md recheck OUTPUT.md

The UI should preserve the current architecture:

  • Use MinerU 3.1.0 through the direct local mineru CLI only.
  • Keep strict-local behavior. Do not expose --api-url, remote endpoints, router mode, cloud OCR, remote LLMs, or external document uploads.
  • Treat the UI .exe as a launcher for the existing local runtime, not as a fully self-contained bundle of MinerU, PyTorch, CUDA DLLs, local models, Node.js, and MathJax.
  • Keep generated Markdown parts, report Markdown, assets, and raw output behavior owned by the existing CLI.

Recommendation

Use a thin Python desktop launcher:

  • UI framework: tkinter plus tkinter.ttk.
  • CLI execution: subprocess.Popen with shell=False, argument lists, a worker thread, and a queue back to the UI thread.
  • Packaging: PyInstaller --onefile --windowed for a lightweight pdf2md-ui.exe.
  • Runtime command: prefer pdf2md if it is on PATH; otherwise run uv run pdf2md with a configured project root.

This is the lowest-risk path because tkinter is in the Python standard library, ttk provides native themed widgets, and PyInstaller directly supports graphical windowed apps on Windows. The UI remains small and avoids bundling the large GPU conversion stack into the UI executable.

Why Not Bundle The Whole Converter Into One EXE

Bundling the full conversion runtime into a single executable is not a good v1 target:

  • The runtime includes CUDA PyTorch, MinerU, model files, optional Node.js/MathJax support, and local cache/config state.
  • Model weights and transitive licenses are already documented as redistribution-sensitive.
  • One-file executables extract at startup; large bundles can start slowly and create antivirus or SmartScreen friction.
  • The project already uses uv and a known local .venv; the UI can call that stable runtime.

Recommended v1 interpretation of ".exe":

  • Build pdf2md-ui.exe as the desktop UI.
  • Require the local converter runtime to be installed and pass pdf2md doctor.
  • Let the UI surface doctor failures clearly instead of pretending to be a complete installer.

Future redistribution can be revisited later as a separate packaging and license sprint.

UI Framework Options

Option Fit Pros Cons Decision
tkinter + ttk Strong Standard library, native file dialogs, themed widgets, minimal dependencies, easy PyInstaller build. Python docs warn that long work must not block Tk's single-threaded event loop, which matches a worker-thread runner design. Visual polish is modest. Advanced drag/drop usually needs extra packages. Recommended for v1.
PySide6 / Qt for Python Medium Polished widgets, strong desktop model, official Python bindings. Adds large Qt dependency, LGPL/commercial considerations, more complex deployment. Qt docs describe PyInstaller and Nuitka paths, plus caveats around virtualenv/system package selection and Qt plugin bundling. Keep as a later polish option.
CustomTkinter Medium More modern look on top of Tkinter. Official wiki notes PyInstaller packaging data-file issues and recommends --onedir instead of --onefile. Adds dependency for mostly visual benefit. Avoid for v1.
Flet Low/medium Modern Flutter-based Python UI, official flet build windows. Windows packaging requires Visual Studio 2022 with Desktop development with C++ workload. Heavier stack than needed for a form/log launcher. Avoid for v1.
Tauri Low Sidecar pattern can embed external binaries and produce polished small desktop apps. Requires Rust and frontend stack, sidecar permissions, target-triple binary naming, and more architecture than needed. Avoid for v1.
Briefcase Medium Produces Windows app folders, MSI installers, and ZIPs; useful for installer-style distribution. More installer-oriented than needed for a first thin launcher. Consider after v1 UI works.

Packaging Options

Tool Relevant facts Fit
PyInstaller Supports one-folder and one-file bundles. On Windows it can create graphical apps without a console window. --onefile, --windowed, --name, --icon, and spec files cover the expected needs. PyInstaller's license includes an exception allowing bundled applications to be shipped under the application's own license, subject to dependency licenses. Recommended.
Nuitka Can create standalone, onefile, and app-mode outputs, and emits .exe on Windows. Requires a C compiler/toolchain and has longer build complexity. Good later if PyInstaller output has startup or AV problems.
pyside6-deploy Official Qt for Python deployment tool wrapping Nuitka. Produces .exe on Windows. Only relevant if choosing PySide6.
Briefcase Windows outputs include app folders plus MSI or ZIP packaging. Uses an embedded Python distribution. Useful for installer sprint, not the first UI executable.
Flet build Official Windows build path exists but requires Visual Studio C++ workload. Too much setup for this project.

CLI Runner Design

The UI should not call MinerU directly. It should call the project-owned CLI:

pdf2md doctor
pdf2md convert <input.pdf> --out <output-dir> --overwrite --gpu cuda:0
pdf2md recheck <output.md>

Command resolution:

  1. If the configured command exists, use it.
  2. Else if pdf2md is on PATH, run pdf2md.
  3. Else if uv is on PATH and a configured project root contains pyproject.toml, run uv run pdf2md with cwd=<project-root>.
  4. Else show a setup error and suggest running pdf2md doctor in the repository.

Subprocess rules:

  • Always pass an argument list with shell=False.
  • Set cwd explicitly when running through uv.
  • Set MINERU_MODEL_SOURCE=local in the child environment unless the user already set it.
  • Merge stderr into stdout for a single UI log stream.
  • Read output line by line in a background thread.
  • Communicate to Tk through queue.Queue and root.after(...).
  • Store the process PID so Cancel can terminate it.

Cancellation on Windows:

  • First call Popen.terminate().
  • If the process does not exit promptly, call taskkill /pid <pid> /t /f to end the process tree. Microsoft documents /t as ending child processes and /f as forceful termination.

Current limitation:

  • The existing MinerU adapter uses subprocess.run(..., capture_output=True) inside pdf2md, so detailed MinerU progress may not stream until the CLI completes. The v1 UI should use an indeterminate progress bar plus final CLI output. A future CLI sprint can add streaming progress/events if needed.

Minimal UI Shape

Single window, no landing page:

  • Input PDF: file picker.
  • Output directory: directory picker, defaulting to outputs/<pdf-stem>.
  • Options:
    • Overwrite checkbox.
    • Keep raw MinerU output checkbox.
    • Group pages checkbox plus numeric field, default 20.
    • GPU field, default cuda:0.
  • Buttons:
    • Doctor.
    • Convert.
    • Cancel.
    • Open output.
  • Status:
    • Indeterminate progress bar while running.
    • Read-only log pane.
    • Last output paths from CLI/report when conversion completes.

No v1 drag/drop, batch queue, config editor, PDF preview, Markdown preview, or Obsidian integration. Those would add scope without helping the first .exe workflow.

Build Shape

Proposed files:

src/
  pdf2md_ui/
    __init__.py
    app.py
    runner.py
tests/
  test_ui_runner.py

Proposed dependency policy:

  • No runtime GUI dependency beyond the standard library.
  • Add PyInstaller only to a local dependency group such as ui-build, not to the converter runtime dependencies.

Proposed build commands:

uv add --group ui-build "pyinstaller>=6.20,<7"
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py

Expected artifact:

dist/pdf2md-ui.exe

The built UI executable should be tested from the repository first, because uv run pdf2md needs a project root. If the executable is moved elsewhere, the UI should ask for and remember the project root in a small settings file under %APPDATA%\pdf2md-ui\settings.json.

Verification Plan

Fast tests:

  • Command resolution with fake PATH/project-root cases.
  • Command construction for doctor, convert, recheck.
  • No generated command contains prohibited strict-local tokens such as --api-url, http://, https://, router, or openai.
  • Output-directory defaulting for ASCII and non-ASCII PDF names using temporary files.
  • Cancel path calls the Windows process-tree termination helper when needed, using a mocked process.

Build verification:

uv run pytest tests/test_ui_runner.py
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Test-Path dist\pdf2md-ui.exe

Manual smoke verification:

  1. Launch dist\pdf2md-ui.exe.
  2. Run Doctor from the UI.
  3. Select a small local sample PDF.
  4. Convert to an ignored outputs/ folder.
  5. Confirm the UI reports completion and the simplified output folder contains *_001.md, images/, and *_report.md.

Security, Privacy, And Distribution Notes

  • The UI must not introduce any network document path.
  • The UI must not expose arbitrary command execution. It should build fixed pdf2md argument lists from validated fields.
  • Use shell=False; never concatenate user-provided paths into a command string.
  • Do not store PDF contents or extracted text in settings.
  • Do not include sample PDFs or generated outputs in the build or commit.
  • Unsigned Windows executables may trigger SmartScreen. Microsoft documents that unsigned files start with no reputation, and even signed new binaries can show warnings until reputation accumulates. Code signing can be planned later if the tool is distributed beyond personal use.
  • If signing is added later, SignTool from the Windows SDK is the documented Microsoft tool. Current SignTool docs require digest options such as /fd and /td, with SHA-256 recommended.

Open Risks

  • A thin launcher depends on an installed and healthy local runtime. The UI must make doctor prominent.
  • Current CLI progress is coarse because pdf2md captures MinerU subprocess output. This is acceptable for v1 but limits progress detail.
  • Cancelling a conversion can leave partially written ignored outputs; the UI should label a cancelled run clearly and not delete user-selected output directories unless a later requirement defines cleanup.
  • If the UI is redistributed, licenses for MinerU, PyTorch, Qt if ever used, model weights, and bundled tools must be reviewed before packaging more than the thin UI launcher.

Sources