modify pdftomd
This commit is contained in:
@@ -0,0 +1,111 @@
|
||||
# UI Folder Batch Conversion Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add a minimal UI folder workflow that converts every direct-child PDF in a selected folder by sequentially invoking the existing `pdf2md convert` CLI.
|
||||
|
||||
**Architecture:** Keep the converter and CLI unchanged. Add deterministic folder discovery and batch command construction to `src/pdf2md_ui/runner.py`, then make `src/pdf2md_ui/app.py` run a list of `CommandSpec` objects sequentially on the existing worker-thread/event-queue pattern.
|
||||
|
||||
**Tech Stack:** Python 3.12, tkinter/ttk, pytest, PyInstaller, existing `pdf2md_ui.runner` subprocess wrapper.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Runner Batch Helpers
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/test_ui_runner.py`
|
||||
- Modify: `src/pdf2md_ui/runner.py`
|
||||
|
||||
- [x] **Step 1: Write failing tests**
|
||||
|
||||
```python
|
||||
def test_list_direct_pdf_files_returns_sorted_direct_children_only(tmp_path: Path) -> None:
|
||||
(tmp_path / "b.PDF").write_text("", encoding="utf-8")
|
||||
(tmp_path / "a.pdf").write_text("", encoding="utf-8")
|
||||
nested = tmp_path / "nested"
|
||||
nested.mkdir()
|
||||
(nested / "c.pdf").write_text("", encoding="utf-8")
|
||||
(tmp_path / "notes.txt").write_text("", encoding="utf-8")
|
||||
|
||||
assert [path.name for path in list_direct_pdf_files(tmp_path)] == ["a.pdf", "b.PDF"]
|
||||
```
|
||||
|
||||
```python
|
||||
def test_build_batch_convert_commands_reuses_convert_options(tmp_path: Path) -> None:
|
||||
resolved = ResolvedCommand(("pdf2md",), cwd=None, source="path")
|
||||
pdfs = [tmp_path / "a.pdf", tmp_path / "b.pdf"]
|
||||
|
||||
commands = build_batch_convert_commands(
|
||||
resolved,
|
||||
pdfs,
|
||||
tmp_path / "out",
|
||||
overwrite=True,
|
||||
keep_raw=True,
|
||||
chunk_pages=5,
|
||||
gpu="auto",
|
||||
mineru_profile="safe",
|
||||
)
|
||||
|
||||
assert [command.args[2] for command in commands] == [str(pdfs[0]), str(pdfs[1])]
|
||||
assert all("--chunk-pages" in command.args for command in commands)
|
||||
assert all("--mineru-profile" in command.args for command in commands)
|
||||
```
|
||||
|
||||
- [x] **Step 2: Run tests to verify RED**
|
||||
|
||||
Run: `uv run pytest tests/test_ui_runner.py::test_list_direct_pdf_files_returns_sorted_direct_children_only tests/test_ui_runner.py::test_build_batch_convert_commands_reuses_convert_options -q`
|
||||
|
||||
Expected: FAIL because the new helpers are not defined.
|
||||
|
||||
- [x] **Step 3: Implement minimal runner helpers**
|
||||
|
||||
Add `list_direct_pdf_files(folder)` using `Path.iterdir()` and case-insensitive `.pdf` suffix matching. Add `build_batch_convert_commands()` that loops over the provided PDF paths and delegates to `build_convert_command()`.
|
||||
|
||||
- [x] **Step 4: Run tests to verify GREEN**
|
||||
|
||||
Run: `uv run pytest tests/test_ui_runner.py -q`
|
||||
|
||||
Expected: all UI runner tests pass.
|
||||
|
||||
### Task 2: Tk UI Batch Execution
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/pdf2md_ui/app.py`
|
||||
|
||||
- [x] **Step 1: Add folder state and controls**
|
||||
|
||||
Add `input_folder_var`, a path row labeled `Input folder`, and a `Convert folder` button beside the existing action buttons.
|
||||
|
||||
- [x] **Step 2: Add batch command startup**
|
||||
|
||||
Implement `_choose_folder()`, `_run_folder_convert()`, and `_start_command_sequence()`. `_run_folder_convert()` validates the folder and output directory, parses `chunk_pages`, builds commands through the runner helper, and starts the sequence.
|
||||
|
||||
- [x] **Step 3: Add sequential worker behavior**
|
||||
|
||||
Run each command synchronously on the worker thread. Emit log messages before each file starts. Stop after the first non-zero exit code. If Cancel is requested, terminate the active command and do not start later commands.
|
||||
|
||||
- [x] **Step 4: Run focused tests**
|
||||
|
||||
Run: `uv run pytest tests/test_ui_runner.py -q`
|
||||
|
||||
Expected: all UI runner tests pass; UI app imports without syntax errors through test collection.
|
||||
|
||||
### Task 3: Build and Handoff
|
||||
|
||||
**Files:**
|
||||
- Modify: `PROGRESS.md`
|
||||
- Generated ignored output: `dist/pdf2md-ui.exe`
|
||||
|
||||
- [x] **Step 1: Rebuild the UI executable**
|
||||
|
||||
Run: `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py`
|
||||
|
||||
Expected: exit code 0 and `dist\pdf2md-ui.exe` exists.
|
||||
|
||||
- [x] **Step 2: Update progress**
|
||||
|
||||
Record the new UI folder batch feature and verification commands in `PROGRESS.md`.
|
||||
|
||||
- [x] **Step 3: Check and commit**
|
||||
|
||||
Run: `git diff --check`, `git status --short`, then commit only the scoped source, test, and documentation changes.
|
||||
Reference in New Issue
Block a user