modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
@@ -0,0 +1,33 @@
# UI Folder Batch Conversion Design
## Goal
Add a minimal UI workflow that lets the user select one folder and convert every PDF directly inside that folder to Markdown.
## Scope
- Include only `*.pdf` files directly under the selected folder.
- Exclude PDFs in nested folders.
- Reuse the existing `pdf2md convert` CLI command for each PDF.
- Keep conversion sequential to avoid GPU and MinerU runtime contention.
- Apply the existing UI conversion options to every PDF in the batch: output directory, overwrite, keep raw, grouped pages, GPU, and MinerU profile.
## Design
The runner layer owns folder discovery and batch command construction. It will expose a small helper that returns direct-child PDF paths in deterministic name order and another helper that builds one fixed-argument `CommandSpec` per PDF by calling the existing `build_convert_command()`.
The Tk UI adds an input-folder row and a folder-convert button. When the user starts folder conversion, the UI validates the selected folder, builds the command list, and runs commands one at a time on the existing worker thread pattern. It logs each PDF before it starts, stops on the first non-zero exit code, and honors Cancel by terminating the currently running process and not starting later PDFs.
## Non-Goals
- No recursive folder conversion.
- No parallel conversion.
- No new CLI command.
- No direct MinerU invocation from the UI.
- No remote/API options or arbitrary shell command execution.
## Verification
- Add focused runner tests for direct-child PDF discovery, nested PDF exclusion, deterministic ordering, and batch command construction.
- Run `uv run pytest tests/test_ui_runner.py`.
- Rebuild the UI executable with PyInstaller and confirm `dist/pdf2md-ui.exe` exists.