Files
PDFToMD/docs/UI_RESEARCH.md
T
2026-05-14 10:16:59 +09:00

238 lines
12 KiB
Markdown

# UI Research: Minimal Windows Launcher For pdf2md
Last updated: 2026-05-11
## Scope
User request:
- Build a minimal UI that uses the existing `pdf2md` CLI.
- Build it into a Windows `.exe`.
- Research the implementation path before coding.
This document is research and planning input only. It does not change runtime behavior.
## Current Project Fit
The existing converter is already centered on a CLI:
```powershell
uv run pdf2md doctor
uv run pdf2md convert INPUT --out OUTPUT --overwrite
uv run pdf2md recheck OUTPUT.md
```
The UI should preserve the current architecture:
- Use MinerU 3.1.0 through the direct local `mineru` CLI only.
- Keep strict-local behavior. Do not expose `--api-url`, remote endpoints, router mode, cloud OCR, remote LLMs, or external document uploads.
- Treat the UI `.exe` as a launcher for the existing local runtime, not as a fully self-contained bundle of MinerU, PyTorch, CUDA DLLs, local models, Node.js, and MathJax.
- Keep generated Markdown parts, report Markdown, assets, and raw output behavior owned by the existing CLI.
## Recommendation
Use a thin Python desktop launcher:
- UI framework: `tkinter` plus `tkinter.ttk`.
- CLI execution: `subprocess.Popen` with `shell=False`, argument lists, a worker thread, and a queue back to the UI thread.
- Packaging: PyInstaller `--onefile --windowed` for a lightweight `pdf2md-ui.exe`.
- Runtime command: prefer `pdf2md` if it is on `PATH`; otherwise run `uv run pdf2md` with a configured project root.
This is the lowest-risk path because `tkinter` is in the Python standard library, `ttk` provides native themed widgets, and PyInstaller directly supports graphical windowed apps on Windows. The UI remains small and avoids bundling the large GPU conversion stack into the UI executable.
## Why Not Bundle The Whole Converter Into One EXE
Bundling the full conversion runtime into a single executable is not a good v1 target:
- The runtime includes CUDA PyTorch, MinerU, model files, optional Node.js/MathJax support, and local cache/config state.
- Model weights and transitive licenses are already documented as redistribution-sensitive.
- One-file executables extract at startup; large bundles can start slowly and create antivirus or SmartScreen friction.
- The project already uses `uv` and a known local `.venv`; the UI can call that stable runtime.
Recommended v1 interpretation of ".exe":
- Build `pdf2md-ui.exe` as the desktop UI.
- Require the local converter runtime to be installed and pass `pdf2md doctor`.
- Let the UI surface doctor failures clearly instead of pretending to be a complete installer.
Future redistribution can be revisited later as a separate packaging and license sprint.
## UI Framework Options
| Option | Fit | Pros | Cons | Decision |
| --- | --- | --- | --- | --- |
| `tkinter` + `ttk` | Strong | Standard library, native file dialogs, themed widgets, minimal dependencies, easy PyInstaller build. Python docs warn that long work must not block Tk's single-threaded event loop, which matches a worker-thread runner design. | Visual polish is modest. Advanced drag/drop usually needs extra packages. | Recommended for v1. |
| PySide6 / Qt for Python | Medium | Polished widgets, strong desktop model, official Python bindings. | Adds large Qt dependency, LGPL/commercial considerations, more complex deployment. Qt docs describe PyInstaller and Nuitka paths, plus caveats around virtualenv/system package selection and Qt plugin bundling. | Keep as a later polish option. |
| CustomTkinter | Medium | More modern look on top of Tkinter. | Official wiki notes PyInstaller packaging data-file issues and recommends `--onedir` instead of `--onefile`. Adds dependency for mostly visual benefit. | Avoid for v1. |
| Flet | Low/medium | Modern Flutter-based Python UI, official `flet build windows`. | Windows packaging requires Visual Studio 2022 with Desktop development with C++ workload. Heavier stack than needed for a form/log launcher. | Avoid for v1. |
| Tauri | Low | Sidecar pattern can embed external binaries and produce polished small desktop apps. | Requires Rust and frontend stack, sidecar permissions, target-triple binary naming, and more architecture than needed. | Avoid for v1. |
| Briefcase | Medium | Produces Windows app folders, MSI installers, and ZIPs; useful for installer-style distribution. | More installer-oriented than needed for a first thin launcher. | Consider after v1 UI works. |
## Packaging Options
| Tool | Relevant facts | Fit |
| --- | --- | --- |
| PyInstaller | Supports one-folder and one-file bundles. On Windows it can create graphical apps without a console window. `--onefile`, `--windowed`, `--name`, `--icon`, and spec files cover the expected needs. PyInstaller's license includes an exception allowing bundled applications to be shipped under the application's own license, subject to dependency licenses. | Recommended. |
| Nuitka | Can create standalone, onefile, and app-mode outputs, and emits `.exe` on Windows. Requires a C compiler/toolchain and has longer build complexity. | Good later if PyInstaller output has startup or AV problems. |
| `pyside6-deploy` | Official Qt for Python deployment tool wrapping Nuitka. Produces `.exe` on Windows. | Only relevant if choosing PySide6. |
| Briefcase | Windows outputs include app folders plus MSI or ZIP packaging. Uses an embedded Python distribution. | Useful for installer sprint, not the first UI executable. |
| Flet build | Official Windows build path exists but requires Visual Studio C++ workload. | Too much setup for this project. |
## CLI Runner Design
The UI should not call MinerU directly. It should call the project-owned CLI:
```text
pdf2md doctor
pdf2md convert <input.pdf> --out <output-dir> --overwrite --gpu cuda:0
pdf2md recheck <output.md>
```
Command resolution:
1. If the configured command exists, use it.
2. Else if `pdf2md` is on `PATH`, run `pdf2md`.
3. Else if `uv` is on `PATH` and a configured project root contains `pyproject.toml`, run `uv run pdf2md` with `cwd=<project-root>`.
4. Else show a setup error and suggest running `pdf2md doctor` in the repository.
Subprocess rules:
- Always pass an argument list with `shell=False`.
- Set `cwd` explicitly when running through `uv`.
- Set `MINERU_MODEL_SOURCE=local` in the child environment unless the user already set it.
- Merge stderr into stdout for a single UI log stream.
- Read output line by line in a background thread.
- Communicate to Tk through `queue.Queue` and `root.after(...)`.
- Store the process PID so Cancel can terminate it.
Cancellation on Windows:
- First call `Popen.terminate()`.
- If the process does not exit promptly, call `taskkill /pid <pid> /t /f` to end the process tree. Microsoft documents `/t` as ending child processes and `/f` as forceful termination.
Current limitation:
- The existing MinerU adapter uses `subprocess.run(..., capture_output=True)` inside `pdf2md`, so detailed MinerU progress may not stream until the CLI completes. The v1 UI should use an indeterminate progress bar plus final CLI output. A future CLI sprint can add streaming progress/events if needed.
## Minimal UI Shape
Single window, no landing page:
- Input PDF: file picker.
- Output directory: directory picker, defaulting to `outputs/<pdf-stem>`.
- Options:
- `Overwrite` checkbox.
- `Keep raw MinerU output` checkbox.
- `Group pages` checkbox plus numeric field, default `20`.
- `GPU` field, default `cuda:0`.
- Buttons:
- `Doctor`.
- `Convert`.
- `Cancel`.
- `Open output`.
- Status:
- Indeterminate progress bar while running.
- Read-only log pane.
- Last output paths from CLI/report when conversion completes.
No v1 drag/drop, batch queue, config editor, PDF preview, Markdown preview, or Obsidian integration. Those would add scope without helping the first `.exe` workflow.
## Build Shape
Proposed files:
```text
src/
pdf2md_ui/
__init__.py
app.py
runner.py
tests/
test_ui_runner.py
```
Proposed dependency policy:
- No runtime GUI dependency beyond the standard library.
- Add PyInstaller only to a local dependency group such as `ui-build`, not to the converter runtime dependencies.
Proposed build commands:
```powershell
uv add --group ui-build "pyinstaller>=6.20,<7"
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
```
Expected artifact:
```text
dist/pdf2md-ui.exe
```
The built UI executable should be tested from the repository first, because `uv run pdf2md` needs a project root. If the executable is moved elsewhere, the UI should ask for and remember the project root in a small settings file under `%APPDATA%\pdf2md-ui\settings.json`.
## Verification Plan
Fast tests:
- Command resolution with fake PATH/project-root cases.
- Command construction for `doctor`, `convert`, `recheck`.
- No generated command contains prohibited strict-local tokens such as `--api-url`, `http://`, `https://`, `router`, or `openai`.
- Output-directory defaulting for ASCII and non-ASCII PDF names using temporary files.
- Cancel path calls the Windows process-tree termination helper when needed, using a mocked process.
Build verification:
```powershell
uv run pytest tests/test_ui_runner.py
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Test-Path dist\pdf2md-ui.exe
```
Manual smoke verification:
1. Launch `dist\pdf2md-ui.exe`.
2. Run Doctor from the UI.
3. Select a small local sample PDF.
4. Convert to an ignored `outputs/` folder.
5. Confirm the UI reports completion and the simplified output folder contains `*_001.md`, `images/`, and `*_report.md`.
## Security, Privacy, And Distribution Notes
- The UI must not introduce any network document path.
- The UI must not expose arbitrary command execution. It should build fixed `pdf2md` argument lists from validated fields.
- Use `shell=False`; never concatenate user-provided paths into a command string.
- Do not store PDF contents or extracted text in settings.
- Do not include sample PDFs or generated outputs in the build or commit.
- Unsigned Windows executables may trigger SmartScreen. Microsoft documents that unsigned files start with no reputation, and even signed new binaries can show warnings until reputation accumulates. Code signing can be planned later if the tool is distributed beyond personal use.
- If signing is added later, SignTool from the Windows SDK is the documented Microsoft tool. Current SignTool docs require digest options such as `/fd` and `/td`, with SHA-256 recommended.
## Open Risks
- A thin launcher depends on an installed and healthy local runtime. The UI must make `doctor` prominent.
- Current CLI progress is coarse because `pdf2md` captures MinerU subprocess output. This is acceptable for v1 but limits progress detail.
- Cancelling a conversion can leave partially written ignored outputs; the UI should label a cancelled run clearly and not delete user-selected output directories unless a later requirement defines cleanup.
- If the UI is redistributed, licenses for MinerU, PyTorch, Qt if ever used, model weights, and bundled tools must be reviewed before packaging more than the thin UI launcher.
## Sources
- Python `tkinter`: https://docs.python.org/3/library/tkinter.html
- Python `tkinter.ttk`: https://docs.python.org/3/library/tkinter.ttk.html
- Python `subprocess`: https://docs.python.org/3/library/subprocess.html
- PyInstaller usage: https://pyinstaller.org/en/stable/usage.html
- PyInstaller requirements: https://pyinstaller.org/en/stable/requirements.html
- PyInstaller license: https://pyinstaller.org/en/stable/license.html
- PyInstaller runtime information: https://pyinstaller.org/en/stable/runtime-information.html
- Nuitka user manual: https://nuitka.net/user-documentation/user-manual.html
- Qt for Python PyInstaller deployment: https://doc.qt.io/qtforpython-6/deployment/deployment-pyinstaller.html
- `pyside6-deploy`: https://doc.qt.io/qtforpython-6.5/deployment/deployment-pyside6-deploy.html
- Qt for Python licenses: https://doc.qt.io/qtforpython-6/licenses.html
- Flet build: https://flet.dev/docs/cli/flet-build/
- Flet Windows packaging: https://flet.dev/docs/publish/windows/
- Tauri sidecars: https://tauri.app/develop/sidecar/
- Briefcase Windows packaging: https://briefcase.beeware.org/en/latest/reference/platforms/windows/
- uv dependency groups: https://docs.astral.sh/uv/concepts/projects/dependencies/
- Microsoft `taskkill`: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/taskkill
- Microsoft SmartScreen reputation: https://learn.microsoft.com/en-us/windows/apps/package-and-deploy/smartscreen-reputation
- Microsoft SignTool: https://learn.microsoft.com/en-us/windows/win32/seccrypto/signtool