Files
PDFToMD/docs/V1IMPLEMENTATIONPLAN.md
2026-05-14 10:16:59 +09:00

163 lines
6.5 KiB
Markdown

# V1 Implementation Plan: Local PDF-to-Markdown Converter
Last updated: 2026-05-13
This document tracks the current v1 implementation state and open future decisions. It does not replace `PRD.md` or `ARCHITECTURE.md`; use those files as the source of product requirements and system design. Completed sprint details are archived in `docs/WORKARCHIVE.md`, and detailed acceptance criteria remain in `docs/Sprints/*.md`.
## 1. Current V1 State
The core v1 converter is implemented through Sprint 16. The implemented system includes:
- Python 3.12 package and `pdf2md` CLI.
- Direct local MinerU 3.1.0 CLI adapter with strict-local enforcement.
- Obsidian-friendly Markdown normalization.
- Internal provenance, structured warnings, quality checks, and one human-readable report.
- `pdf2md doctor`.
- Optional grouped page conversion through `--chunk-pages`.
- Local MathJax render checking and conservative failed-span repair.
- pypdf-based text fidelity diagnostics.
- NVIDIA GPU inventory, `--gpu auto`, and `--mineru-profile auto|safe|performance`.
- Simplified output layout: `<out>/<stem>/<stem>_001.md`, shared `<out>/<stem>/images/`, and `<out>/<stem>/<stem>_report.md`.
- No public metadata JSON for new conversions.
- Minimal Windows UI launcher over the existing CLI, including direct-folder PDF batch conversion through sequential `pdf2md convert` subprocesses.
Historical implementation evidence, verification commands, and sample conversion results are in `docs/WORKARCHIVE.md`.
## 2. V1 Outcome
v1 is complete when a local user can run:
```bash
uv run pdf2md doctor
uv run pdf2md convert paper.pdf --out out
uv run pdf2md convert pdfs --out out --recursive
```
and receive, for each PDF:
- Obsidian-friendly Markdown parts under `<out>/<stem>/<stem>_001.md`, `<stem>_002.md`, and so on.
- A stable shared image/media directory under `<out>/<stem>/images/`.
- One human-readable report under `<out>/<stem>/<stem>_report.md`.
- No persisted metadata JSON for new conversions.
- Clear warnings when math, tables, assets, reading order, text fidelity, GPU availability, or MinerU execution are uncertain.
Long PDFs can be chunked explicitly:
```bash
uv run pdf2md convert paper.pdf --out out --chunk-pages
uv run pdf2md convert paper.pdf --out out --chunk-pages 20
```
When `--chunk-pages` is active, MinerU receives one-page temporary PDFs and final Markdown files are grouped by the configured page count. Temporary one-page PDFs and intermediate per-page outputs are deleted.
The Windows UI launcher is a convenience wrapper over `pdf2md`; it is not a separate conversion pipeline. UI folder batch conversion runs direct-child PDFs sequentially through the same CLI conversion path.
## 3. Non-Negotiable Constraints
- Python 3.12 and `uv`.
- MinerU 3.1.0 is the only conversion engine.
- Direct local MinerU CLI execution only.
- MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
- No cloud OCR, hosted LLM/VLM, remote document parser, `--api-url`, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
- Target hardware: NVIDIA GTX 1070 Ti 8GB.
- Digital PDFs with text layers are the v1 priority.
- `samples/` is local fixture context and must not be committed unless explicitly requested.
- UI launcher must invoke `pdf2md` or `uv run pdf2md`; it must not call MinerU directly or bundle the full conversion runtime.
- Every substantial implementation chunk needs a sprint contract and independent evaluation.
## 4. Current Repository Layout
```text
pyproject.toml
README.md
src/
pdf2md/
__init__.py
cli.py
conversion.py
pdf_splitter.py
paths.py
mineru_adapter.py
ir.py
markdown.py
metadata.py
quality.py
report.py
doctor.py
gpu.py
mineru_profile.py
math_render.py
math_repair.py
text_fidelity.py
pdf2md_ui/
__init__.py
app.py
runner.py
tests/
integration/
docs/
Sprints/
superpowers/
```
Do not scaffold unused modules before a sprint needs them.
## 5. Active Next Sprint
Status:
- No active implementation sprint.
Next implementation work should start from a new user-approved requirement and, if substantial, a new sprint contract.
## 6. Abandoned Planning
### Sprint 17: Offline Windows Installer
Status:
- Abandoned at the user's request on 2026-05-13.
Historical references:
- `docs/Sprints/SPRINT17CONTRACT.md`.
- `docs/superpowers/plans/2026-05-12-offline-installer.md`.
Do not implement or extend Sprint 17 unless the user explicitly reopens offline installer work.
## 7. Future Decisions
- Decide whether simplified outputs need a metadata-free `pdf2md recheck`; current `recheck` remains legacy-only for outputs with adjacent metadata JSON.
- Validate `--gpu auto --mineru-profile auto` on a stronger NVIDIA GPU PC.
## 8. Harness Operating Model
Use the project long-running harness only for substantial implementation work.
1. `harness-planner-agent` turns the next user request into a sprint contract.
2. `evaluation-agent` reviews the contract before code changes start.
3. `feature-generator-agent` implements one approved contract at a time.
4. `feature-generator-agent` runs self-checks and records residual risks.
5. `evaluation-agent` independently verifies the result against the contract.
6. The parent agent updates `PROGRESS.md`, commits the completed change, and leaves a handoff.
After a chunk is no longer active, archive completed-work details in `docs/WORKARCHIVE.md` and keep `PROGRESS.md` focused on current status, blockers, and next actions.
## 9. Completed Sprint Archive
Completed sprint details have been moved out of this active implementation plan.
- Summary and verification evidence: `docs/WORKARCHIVE.md`.
- Detailed historical contracts: `docs/Sprints/SPRINT0CONTRACT.md` through `docs/Sprints/SPRINT16CONTRACT.md`.
- UI folder batch design and execution record: `docs/superpowers/specs/2026-05-13-ui-folder-batch-conversion-design.md` and `docs/superpowers/plans/2026-05-13-ui-folder-batch-conversion.md`.
- Abandoned Sprint 17 planning record: `docs/Sprints/SPRINT17CONTRACT.md` and `docs/superpowers/plans/2026-05-12-offline-installer.md`.
Facts carried forward from completed work:
- MinerU is fixed to version 3.1.0.
- Direct local CLI command shape is `mineru -p <input> -o <output>`.
- Python 3.12 is compatible with the pinned MinerU package range.
- GTX 1070 Ti CUDA/PyTorch support needs explicit doctor validation.
- Formula reconstruction remains best effort and must keep warnings/provenance visible.
- MinerU/model license posture is acceptable for personal local use. Redistribution remains gated by license review.