modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
+218
View File
@@ -0,0 +1,218 @@
# Sprint 12 Contract: Minimal Windows UI Launcher
Status: Implemented with residual conversion-smoke risk
Last updated: 2026-05-11
## Objective
Build a minimal Windows desktop launcher for the existing `pdf2md` CLI and package the launcher itself as `dist/pdf2md-ui.exe`.
The UI must remain a thin local launcher. It must not become a second conversion engine, a hosted app, a manual review workflow, or a bundled redistribution of MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
## Research Basis
- Primary research document: `docs/UI_RESEARCH.md`.
- The recommended implementation path is `tkinter`/`ttk`, a subprocess runner around `pdf2md` or `uv run pdf2md`, and PyInstaller for the Windows executable.
## Current Precondition
- `pdf2md doctor`, `pdf2md convert`, and `pdf2md recheck` are implemented.
- Conversion remains strict-local and MinerU-only.
- Current CLI output is coarse during MinerU execution because the adapter captures MinerU subprocess output internally.
- UI research is complete.
- UI implementation exists under `src/pdf2md_ui/`.
- `dist\pdf2md-ui.exe` can be built with PyInstaller.
## Touched Surfaces
Allowed during implementation:
- `src/pdf2md_ui/__init__.py`
- `src/pdf2md_ui/app.py`
- `src/pdf2md_ui/runner.py`
- `tests/test_ui_runner.py`
- `pyproject.toml`
- `uv.lock`
- `README.md`
- `PLAN.md`
- `PROGRESS.md`
- `docs/WORKARCHIVE.md`
- `docs/V1IMPLEMENTATIONPLAN.md`
Generated but not committed unless explicitly requested:
- `build/`
- `dist/`
- `*.spec`
- generated conversion outputs under `outputs/`
Not allowed:
- Runtime document upload paths.
- Remote OCR, hosted LLM/VLM, hosted renderers, or remote document parsing APIs.
- `--api-url`, router mode, HTTP client backends, remote OpenAI-compatible endpoints, or runtime engine selection.
- Direct UI calls to `mineru`; the UI must call the project-owned `pdf2md` CLI.
- Bundling MinerU, CUDA PyTorch, local model weights, Node.js, or MathJax into the first UI executable.
- Batch queues, drag/drop, PDF preview, Markdown preview, Obsidian automation, installer generation, or code signing in this sprint.
- Mandatory default tests that require real MinerU, GPU, model files, network, Obsidian, or `samples/`.
## Product Behavior
The first UI is a single-window launcher:
- Select one input PDF.
- Select an output root, defaulting to `outputs`; the current CLI creates the final `<stem>\` folder inside it.
- Configure only existing CLI options:
- overwrite
- keep raw output
- optional grouped pages with default `20`
- GPU device with default `cuda:0`, including `auto` when supported by the CLI
- MinerU profile `auto|safe|performance` with default `auto`
- Run `Doctor`.
- Run `Convert`.
- Run `Recheck` for an existing Markdown output.
- Cancel a running subprocess.
- Open the output directory after completion.
- Show a read-only log and indeterminate progress while a command is running.
Command resolution:
1. Use a configured command if present.
2. Else use `pdf2md` from `PATH`.
3. Else use `uv run pdf2md` from a configured project root containing `pyproject.toml`.
4. Else report a setup error and direct the user to run `pdf2md doctor`.
## Architecture Plan
### WP12.1: CLI Runner
Actions:
- Add a runner module that builds fixed argument lists for `doctor`, `convert`, and `recheck`.
- Use `subprocess.Popen` with `shell=False`.
- Set `MINERU_MODEL_SOURCE=local` in the child environment unless already set.
- Merge stderr into stdout for a single UI log stream.
- Read subprocess output on a worker thread and report status events to the UI.
- Add a Windows process-tree cancellation helper that uses `taskkill /pid <pid> /t /f` only after normal termination does not finish promptly.
Expected output:
- Testable command-construction and process-management code that never accepts arbitrary shell text from the UI.
### WP12.2: Minimal Tk UI
Actions:
- Add a `tkinter`/`ttk` app with file and directory pickers, option controls, command buttons, progress indicator, and log pane.
- Keep long-running work off Tk's event handler thread.
- Disable conflicting controls while a command is running.
- Surface non-zero exit codes clearly.
Expected output:
- A simple local GUI for existing CLI workflows.
### WP12.3: Build
Actions:
- Add PyInstaller only to a build dependency group such as `ui-build`.
- Build the executable with:
```powershell
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
```
Expected output:
- `dist\pdf2md-ui.exe` exists after the build.
## Verification Checks
Default checks:
- `uv run pytest tests/test_ui_runner.py`
- `uv run pytest tests/test_cli.py` if shared CLI behavior changes
- `git diff --check`
- `git status --short --untracked-files=all`
Build check:
```powershell
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
Test-Path dist\pdf2md-ui.exe
```
Manual smoke:
1. Launch `dist\pdf2md-ui.exe`.
2. Run Doctor from the UI.
3. Convert one small local sample into an ignored `outputs/` directory.
4. Confirm Markdown, report Markdown, and assets are produced as expected for the active output layout.
## Acceptance Criteria
- The UI invokes `pdf2md` or `uv run pdf2md`; it never invokes `mineru` directly.
- Commands are fixed argument lists and run with `shell=False`.
- The UI remains responsive while a conversion is running.
- Cancel attempts to stop the process tree on Windows.
- Doctor and conversion exit codes are visible in the UI.
- PyInstaller produces `dist\pdf2md-ui.exe`.
- Default tests stay independent of real MinerU, GPU, model files, network, Obsidian, and `samples/`.
## Hard Failure Criteria
- UI code exposes arbitrary shell command execution.
- UI exposes remote/API options or weakens strict-local policy.
- UI claims conversion success without checking the CLI exit code.
- UI freezes during a long conversion because the CLI runs on Tk's event handler thread.
- The first UI executable bundles MinerU, CUDA PyTorch, model weights, Node.js, or MathJax.
- Build outputs, generated conversion outputs, local models, or sample PDFs are committed.
## Handoff Requirements
After implementation:
- Update `PROGRESS.md` with files changed, commands run, test outcomes, build outcome, known failures, residual risks, and next action.
- Move completed implementation details to `docs/WORKARCHIVE.md` after verification.
- Keep sample PDFs and generated outputs out of the commit.
## Implementation Handoff
Files changed:
- `src/pdf2md_ui/__init__.py`
- `src/pdf2md_ui/app.py`
- `src/pdf2md_ui/runner.py`
- `tests/test_ui_runner.py`
- `pyproject.toml`
- `uv.lock`
- `README.md`
- `PLAN.md`
- `PROGRESS.md`
- `docs/WORKARCHIVE.md`
- `docs/V1IMPLEMENTATIONPLAN.md`
Verification:
- `uv run pytest tests\test_ui_runner.py`: passed 16 tests.
- `uv run pytest`: passed 188 tests with 1 optional skip.
- `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py`: passed.
- `Test-Path dist\pdf2md-ui.exe`: returned `True`.
- `uv run pdf2md doctor`: returned WARN only for the documented GTX 1070 Ti/Pascal compatibility risk.
- Launch smoke for `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script.
Follow-up refresh on 2026-05-12:
- Updated the UI command builder and form controls for the Sprint 15 `--mineru-profile auto|safe|performance` CLI option.
- Rebuilt `dist\pdf2md-ui.exe` after Sprint 16 simplified output layout and Sprint 15 profile changes.
- `uv run pytest tests\test_ui_runner.py`: passed 17 tests.
- Launch smoke for the rebuilt `dist\pdf2md-ui.exe`: process started and was then terminated by the smoke script.
Known failure:
- A CLI conversion smoke using `samples\FourNodeQuadrilateralShellElementMITC4.pdf` and the same command shape used by the UI did not finish within the 15-minute timeout. The spawned process tree was terminated with `taskkill`.
Residual risk:
- A hands-on UI Doctor click and UI conversion click should still be run when the local MinerU runtime is expected to complete within an acceptable time.