163 lines
6.5 KiB
Markdown
163 lines
6.5 KiB
Markdown
# V1 Implementation Plan: Local PDF-to-Markdown Converter
|
|
|
|
Last updated: 2026-05-13
|
|
|
|
This document tracks the current v1 implementation state and open future decisions. It does not replace `PRD.md` or `ARCHITECTURE.md`; use those files as the source of product requirements and system design. Completed sprint details are archived in `docs/WORKARCHIVE.md`, and detailed acceptance criteria remain in `docs/Sprints/*.md`.
|
|
|
|
## 1. Current V1 State
|
|
|
|
The core v1 converter is implemented through Sprint 16. The implemented system includes:
|
|
|
|
- Python 3.12 package and `pdf2md` CLI.
|
|
- Direct local MinerU 3.1.0 CLI adapter with strict-local enforcement.
|
|
- Obsidian-friendly Markdown normalization.
|
|
- Internal provenance, structured warnings, quality checks, and one human-readable report.
|
|
- `pdf2md doctor`.
|
|
- Optional grouped page conversion through `--chunk-pages`.
|
|
- Local MathJax render checking and conservative failed-span repair.
|
|
- pypdf-based text fidelity diagnostics.
|
|
- NVIDIA GPU inventory, `--gpu auto`, and `--mineru-profile auto|safe|performance`.
|
|
- Simplified output layout: `<out>/<stem>/<stem>_001.md`, shared `<out>/<stem>/images/`, and `<out>/<stem>/<stem>_report.md`.
|
|
- No public metadata JSON for new conversions.
|
|
- Minimal Windows UI launcher over the existing CLI, including direct-folder PDF batch conversion through sequential `pdf2md convert` subprocesses.
|
|
|
|
Historical implementation evidence, verification commands, and sample conversion results are in `docs/WORKARCHIVE.md`.
|
|
|
|
## 2. V1 Outcome
|
|
|
|
v1 is complete when a local user can run:
|
|
|
|
```bash
|
|
uv run pdf2md doctor
|
|
uv run pdf2md convert paper.pdf --out out
|
|
uv run pdf2md convert pdfs --out out --recursive
|
|
```
|
|
|
|
and receive, for each PDF:
|
|
|
|
- Obsidian-friendly Markdown parts under `<out>/<stem>/<stem>_001.md`, `<stem>_002.md`, and so on.
|
|
- A stable shared image/media directory under `<out>/<stem>/images/`.
|
|
- One human-readable report under `<out>/<stem>/<stem>_report.md`.
|
|
- No persisted metadata JSON for new conversions.
|
|
- Clear warnings when math, tables, assets, reading order, text fidelity, GPU availability, or MinerU execution are uncertain.
|
|
|
|
Long PDFs can be chunked explicitly:
|
|
|
|
```bash
|
|
uv run pdf2md convert paper.pdf --out out --chunk-pages
|
|
uv run pdf2md convert paper.pdf --out out --chunk-pages 20
|
|
```
|
|
|
|
When `--chunk-pages` is active, MinerU receives one-page temporary PDFs and final Markdown files are grouped by the configured page count. Temporary one-page PDFs and intermediate per-page outputs are deleted.
|
|
|
|
The Windows UI launcher is a convenience wrapper over `pdf2md`; it is not a separate conversion pipeline. UI folder batch conversion runs direct-child PDFs sequentially through the same CLI conversion path.
|
|
|
|
## 3. Non-Negotiable Constraints
|
|
|
|
- Python 3.12 and `uv`.
|
|
- MinerU 3.1.0 is the only conversion engine.
|
|
- Direct local MinerU CLI execution only.
|
|
- MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
|
|
- No cloud OCR, hosted LLM/VLM, remote document parser, `--api-url`, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
|
|
- Target hardware: NVIDIA GTX 1070 Ti 8GB.
|
|
- Digital PDFs with text layers are the v1 priority.
|
|
- `samples/` is local fixture context and must not be committed unless explicitly requested.
|
|
- UI launcher must invoke `pdf2md` or `uv run pdf2md`; it must not call MinerU directly or bundle the full conversion runtime.
|
|
- Every substantial implementation chunk needs a sprint contract and independent evaluation.
|
|
|
|
## 4. Current Repository Layout
|
|
|
|
```text
|
|
pyproject.toml
|
|
README.md
|
|
src/
|
|
pdf2md/
|
|
__init__.py
|
|
cli.py
|
|
conversion.py
|
|
pdf_splitter.py
|
|
paths.py
|
|
mineru_adapter.py
|
|
ir.py
|
|
markdown.py
|
|
metadata.py
|
|
quality.py
|
|
report.py
|
|
doctor.py
|
|
gpu.py
|
|
mineru_profile.py
|
|
math_render.py
|
|
math_repair.py
|
|
text_fidelity.py
|
|
pdf2md_ui/
|
|
__init__.py
|
|
app.py
|
|
runner.py
|
|
tests/
|
|
integration/
|
|
docs/
|
|
Sprints/
|
|
superpowers/
|
|
```
|
|
|
|
Do not scaffold unused modules before a sprint needs them.
|
|
|
|
## 5. Active Next Sprint
|
|
|
|
Status:
|
|
|
|
- No active implementation sprint.
|
|
|
|
Next implementation work should start from a new user-approved requirement and, if substantial, a new sprint contract.
|
|
|
|
## 6. Abandoned Planning
|
|
|
|
### Sprint 17: Offline Windows Installer
|
|
|
|
Status:
|
|
|
|
- Abandoned at the user's request on 2026-05-13.
|
|
|
|
Historical references:
|
|
|
|
- `docs/Sprints/SPRINT17CONTRACT.md`.
|
|
- `docs/superpowers/plans/2026-05-12-offline-installer.md`.
|
|
|
|
Do not implement or extend Sprint 17 unless the user explicitly reopens offline installer work.
|
|
|
|
## 7. Future Decisions
|
|
|
|
- Decide whether simplified outputs need a metadata-free `pdf2md recheck`; current `recheck` remains legacy-only for outputs with adjacent metadata JSON.
|
|
- Validate `--gpu auto --mineru-profile auto` on a stronger NVIDIA GPU PC.
|
|
|
|
## 8. Harness Operating Model
|
|
|
|
Use the project long-running harness only for substantial implementation work.
|
|
|
|
1. `harness-planner-agent` turns the next user request into a sprint contract.
|
|
2. `evaluation-agent` reviews the contract before code changes start.
|
|
3. `feature-generator-agent` implements one approved contract at a time.
|
|
4. `feature-generator-agent` runs self-checks and records residual risks.
|
|
5. `evaluation-agent` independently verifies the result against the contract.
|
|
6. The parent agent updates `PROGRESS.md`, commits the completed change, and leaves a handoff.
|
|
|
|
After a chunk is no longer active, archive completed-work details in `docs/WORKARCHIVE.md` and keep `PROGRESS.md` focused on current status, blockers, and next actions.
|
|
|
|
## 9. Completed Sprint Archive
|
|
|
|
Completed sprint details have been moved out of this active implementation plan.
|
|
|
|
- Summary and verification evidence: `docs/WORKARCHIVE.md`.
|
|
- Detailed historical contracts: `docs/Sprints/SPRINT0CONTRACT.md` through `docs/Sprints/SPRINT16CONTRACT.md`.
|
|
- UI folder batch design and execution record: `docs/superpowers/specs/2026-05-13-ui-folder-batch-conversion-design.md` and `docs/superpowers/plans/2026-05-13-ui-folder-batch-conversion.md`.
|
|
- Abandoned Sprint 17 planning record: `docs/Sprints/SPRINT17CONTRACT.md` and `docs/superpowers/plans/2026-05-12-offline-installer.md`.
|
|
|
|
Facts carried forward from completed work:
|
|
|
|
- MinerU is fixed to version 3.1.0.
|
|
- Direct local CLI command shape is `mineru -p <input> -o <output>`.
|
|
- Python 3.12 is compatible with the pinned MinerU package range.
|
|
- GTX 1070 Ti CUDA/PyTorch support needs explicit doctor validation.
|
|
- Formula reconstruction remains best effort and must keep warnings/provenance visible.
|
|
- MinerU/model license posture is acceptable for personal local use. Redistribution remains gated by license review.
|