modify pdftomd
This commit is contained in:
@@ -0,0 +1,218 @@
|
||||
# Sprint 12 Contract: Minimal Windows UI Launcher
|
||||
|
||||
Status: Implemented with residual conversion-smoke risk
|
||||
Last updated: 2026-05-11
|
||||
|
||||
## Objective
|
||||
|
||||
Build a minimal Windows desktop launcher for the existing `pdf2md` CLI and package the launcher itself as `dist/pdf2md-ui.exe`.
|
||||
|
||||
The UI must remain a thin local launcher. It must not become a second conversion engine, a hosted app, a manual review workflow, or a bundled redistribution of MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
|
||||
|
||||
## Research Basis
|
||||
|
||||
- Primary research document: `docs/UI_RESEARCH.md`.
|
||||
- The recommended implementation path is `tkinter`/`ttk`, a subprocess runner around `pdf2md` or `uv run pdf2md`, and PyInstaller for the Windows executable.
|
||||
|
||||
## Current Precondition
|
||||
|
||||
- `pdf2md doctor`, `pdf2md convert`, and `pdf2md recheck` are implemented.
|
||||
- Conversion remains strict-local and MinerU-only.
|
||||
- Current CLI output is coarse during MinerU execution because the adapter captures MinerU subprocess output internally.
|
||||
- UI research is complete.
|
||||
- UI implementation exists under `src/pdf2md_ui/`.
|
||||
- `dist\pdf2md-ui.exe` can be built with PyInstaller.
|
||||
|
||||
## Touched Surfaces
|
||||
|
||||
Allowed during implementation:
|
||||
|
||||
- `src/pdf2md_ui/__init__.py`
|
||||
- `src/pdf2md_ui/app.py`
|
||||
- `src/pdf2md_ui/runner.py`
|
||||
- `tests/test_ui_runner.py`
|
||||
- `pyproject.toml`
|
||||
- `uv.lock`
|
||||
- `README.md`
|
||||
- `PLAN.md`
|
||||
- `PROGRESS.md`
|
||||
- `docs/WORKARCHIVE.md`
|
||||
- `docs/V1IMPLEMENTATIONPLAN.md`
|
||||
|
||||
Generated but not committed unless explicitly requested:
|
||||
|
||||
- `build/`
|
||||
- `dist/`
|
||||
- `*.spec`
|
||||
- generated conversion outputs under `outputs/`
|
||||
|
||||
Not allowed:
|
||||
|
||||
- Runtime document upload paths.
|
||||
- Remote OCR, hosted LLM/VLM, hosted renderers, or remote document parsing APIs.
|
||||
- `--api-url`, router mode, HTTP client backends, remote OpenAI-compatible endpoints, or runtime engine selection.
|
||||
- Direct UI calls to `mineru`; the UI must call the project-owned `pdf2md` CLI.
|
||||
- Bundling MinerU, CUDA PyTorch, local model weights, Node.js, or MathJax into the first UI executable.
|
||||
- Batch queues, drag/drop, PDF preview, Markdown preview, Obsidian automation, installer generation, or code signing in this sprint.
|
||||
- Mandatory default tests that require real MinerU, GPU, model files, network, Obsidian, or `samples/`.
|
||||
|
||||
## Product Behavior
|
||||
|
||||
The first UI is a single-window launcher:
|
||||
|
||||
- Select one input PDF.
|
||||
- Select an output root, defaulting to `outputs`; the current CLI creates the final `<stem>\` folder inside it.
|
||||
- Configure only existing CLI options:
|
||||
- overwrite
|
||||
- keep raw output
|
||||
- optional grouped pages with default `20`
|
||||
- GPU device with default `cuda:0`, including `auto` when supported by the CLI
|
||||
- MinerU profile `auto|safe|performance` with default `auto`
|
||||
- Run `Doctor`.
|
||||
- Run `Convert`.
|
||||
- Run `Recheck` for an existing Markdown output.
|
||||
- Cancel a running subprocess.
|
||||
- Open the output directory after completion.
|
||||
- Show a read-only log and indeterminate progress while a command is running.
|
||||
|
||||
Command resolution:
|
||||
|
||||
1. Use a configured command if present.
|
||||
2. Else use `pdf2md` from `PATH`.
|
||||
3. Else use `uv run pdf2md` from a configured project root containing `pyproject.toml`.
|
||||
4. Else report a setup error and direct the user to run `pdf2md doctor`.
|
||||
|
||||
## Architecture Plan
|
||||
|
||||
### WP12.1: CLI Runner
|
||||
|
||||
Actions:
|
||||
|
||||
- Add a runner module that builds fixed argument lists for `doctor`, `convert`, and `recheck`.
|
||||
- Use `subprocess.Popen` with `shell=False`.
|
||||
- Set `MINERU_MODEL_SOURCE=local` in the child environment unless already set.
|
||||
- Merge stderr into stdout for a single UI log stream.
|
||||
- Read subprocess output on a worker thread and report status events to the UI.
|
||||
- Add a Windows process-tree cancellation helper that uses `taskkill /pid <pid> /t /f` only after normal termination does not finish promptly.
|
||||
|
||||
Expected output:
|
||||
|
||||
- Testable command-construction and process-management code that never accepts arbitrary shell text from the UI.
|
||||
|
||||
### WP12.2: Minimal Tk UI
|
||||
|
||||
Actions:
|
||||
|
||||
- Add a `tkinter`/`ttk` app with file and directory pickers, option controls, command buttons, progress indicator, and log pane.
|
||||
- Keep long-running work off Tk's event handler thread.
|
||||
- Disable conflicting controls while a command is running.
|
||||
- Surface non-zero exit codes clearly.
|
||||
|
||||
Expected output:
|
||||
|
||||
- A simple local GUI for existing CLI workflows.
|
||||
|
||||
### WP12.3: Build
|
||||
|
||||
Actions:
|
||||
|
||||
- Add PyInstaller only to a build dependency group such as `ui-build`.
|
||||
- Build the executable with:
|
||||
|
||||
```powershell
|
||||
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
|
||||
```
|
||||
|
||||
Expected output:
|
||||
|
||||
- `dist\pdf2md-ui.exe` exists after the build.
|
||||
|
||||
## Verification Checks
|
||||
|
||||
Default checks:
|
||||
|
||||
- `uv run pytest tests/test_ui_runner.py`
|
||||
- `uv run pytest tests/test_cli.py` if shared CLI behavior changes
|
||||
- `git diff --check`
|
||||
- `git status --short --untracked-files=all`
|
||||
|
||||
Build check:
|
||||
|
||||
```powershell
|
||||
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
|
||||
Test-Path dist\pdf2md-ui.exe
|
||||
```
|
||||
|
||||
Manual smoke:
|
||||
|
||||
1. Launch `dist\pdf2md-ui.exe`.
|
||||
2. Run Doctor from the UI.
|
||||
3. Convert one small local sample into an ignored `outputs/` directory.
|
||||
4. Confirm Markdown, report Markdown, and assets are produced as expected for the active output layout.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- The UI invokes `pdf2md` or `uv run pdf2md`; it never invokes `mineru` directly.
|
||||
- Commands are fixed argument lists and run with `shell=False`.
|
||||
- The UI remains responsive while a conversion is running.
|
||||
- Cancel attempts to stop the process tree on Windows.
|
||||
- Doctor and conversion exit codes are visible in the UI.
|
||||
- PyInstaller produces `dist\pdf2md-ui.exe`.
|
||||
- Default tests stay independent of real MinerU, GPU, model files, network, Obsidian, and `samples/`.
|
||||
|
||||
## Hard Failure Criteria
|
||||
|
||||
- UI code exposes arbitrary shell command execution.
|
||||
- UI exposes remote/API options or weakens strict-local policy.
|
||||
- UI claims conversion success without checking the CLI exit code.
|
||||
- UI freezes during a long conversion because the CLI runs on Tk's event handler thread.
|
||||
- The first UI executable bundles MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
|
||||
- Build outputs, generated conversion outputs, local models, or sample PDFs are committed.
|
||||
|
||||
## Handoff Requirements
|
||||
|
||||
After implementation:
|
||||
|
||||
- Update `PROGRESS.md` with files changed, commands run, test outcomes, build outcome, known failures, residual risks, and next action.
|
||||
- Move completed implementation details to `docs/WORKARCHIVE.md` after verification.
|
||||
- Keep sample PDFs and generated outputs out of the commit.
|
||||
|
||||
## Implementation Handoff
|
||||
|
||||
Files changed:
|
||||
|
||||
- `src/pdf2md_ui/__init__.py`
|
||||
- `src/pdf2md_ui/app.py`
|
||||
- `src/pdf2md_ui/runner.py`
|
||||
- `tests/test_ui_runner.py`
|
||||
- `pyproject.toml`
|
||||
- `uv.lock`
|
||||
- `README.md`
|
||||
- `PLAN.md`
|
||||
- `PROGRESS.md`
|
||||
- `docs/WORKARCHIVE.md`
|
||||
- `docs/V1IMPLEMENTATIONPLAN.md`
|
||||
|
||||
Verification:
|
||||
|
||||
- `uv run pytest tests\test_ui_runner.py`: passed 16 tests.
|
||||
- `uv run pytest`: passed 188 tests with 1 optional skip.
|
||||
- `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py`: passed.
|
||||
- `Test-Path dist\pdf2md-ui.exe`: returned `True`.
|
||||
- `uv run pdf2md doctor`: returned WARN only for the documented GTX 1070 Ti/Pascal compatibility risk.
|
||||
- Launch smoke for `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script.
|
||||
|
||||
Follow-up refresh on 2026-05-12:
|
||||
|
||||
- Updated the UI command builder and form controls for the Sprint 15 `--mineru-profile auto|safe|performance` CLI option.
|
||||
- Rebuilt `dist\pdf2md-ui.exe` after Sprint 16 simplified output layout and Sprint 15 profile changes.
|
||||
- `uv run pytest tests\test_ui_runner.py`: passed 17 tests.
|
||||
- Launch smoke for the rebuilt `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script.
|
||||
|
||||
Known failure:
|
||||
|
||||
- A CLI conversion smoke using `samples\FourNodeQuadrilateralShellElementMITC4.pdf` and the same command shape used by the UI did not finish within the 15-minute timeout. The spawned process tree was terminated with `taskkill`.
|
||||
|
||||
Residual risk:
|
||||
|
||||
- A hands-on UI Doctor click and UI conversion click should still be run when the local MinerU runtime is expected to complete within an acceptable time.
|
||||
Reference in New Issue
Block a user