6.5 KiB
V1 Implementation Plan: Local PDF-to-Markdown Converter
Last updated: 2026-05-13
This document tracks the current v1 implementation state and open future decisions. It does not replace PRD.md or ARCHITECTURE.md; use those files as the source of product requirements and system design. Completed sprint details are archived in docs/WORKARCHIVE.md, and detailed acceptance criteria remain in docs/Sprints/*.md.
1. Current V1 State
The core v1 converter is implemented through Sprint 16. The implemented system includes:
- Python 3.12 package and
pdf2mdCLI. - Direct local MinerU 3.1.0 CLI adapter with strict-local enforcement.
- Obsidian-friendly Markdown normalization.
- Internal provenance, structured warnings, quality checks, and one human-readable report.
pdf2md doctor.- Optional grouped page conversion through
--chunk-pages. - Local MathJax render checking and conservative failed-span repair.
- pypdf-based text fidelity diagnostics.
- NVIDIA GPU inventory,
--gpu auto, and--mineru-profile auto|safe|performance. - Simplified output layout:
<out>/<stem>/<stem>_001.md, shared<out>/<stem>/images/, and<out>/<stem>/<stem>_report.md. - No public metadata JSON for new conversions.
- Minimal Windows UI launcher over the existing CLI, including direct-folder PDF batch conversion through sequential
pdf2md convertsubprocesses.
Historical implementation evidence, verification commands, and sample conversion results are in docs/WORKARCHIVE.md.
2. V1 Outcome
v1 is complete when a local user can run:
uv run pdf2md doctor
uv run pdf2md convert paper.pdf --out out
uv run pdf2md convert pdfs --out out --recursive
and receive, for each PDF:
- Obsidian-friendly Markdown parts under
<out>/<stem>/<stem>_001.md,<stem>_002.md, and so on. - A stable shared image/media directory under
<out>/<stem>/images/. - One human-readable report under
<out>/<stem>/<stem>_report.md. - No persisted metadata JSON for new conversions.
- Clear warnings when math, tables, assets, reading order, text fidelity, GPU availability, or MinerU execution are uncertain.
Long PDFs can be chunked explicitly:
uv run pdf2md convert paper.pdf --out out --chunk-pages
uv run pdf2md convert paper.pdf --out out --chunk-pages 20
When --chunk-pages is active, MinerU receives one-page temporary PDFs and final Markdown files are grouped by the configured page count. Temporary one-page PDFs and intermediate per-page outputs are deleted.
The Windows UI launcher is a convenience wrapper over pdf2md; it is not a separate conversion pipeline. UI folder batch conversion runs direct-child PDFs sequentially through the same CLI conversion path.
3. Non-Negotiable Constraints
- Python 3.12 and
uv. - MinerU 3.1.0 is the only conversion engine.
- Direct local MinerU CLI execution only.
- MinerU 3.1.0 may launch a temporary local
mineru-apiinternally when CLI runs without--api-url. - No cloud OCR, hosted LLM/VLM, remote document parser,
--api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends. - Target hardware: NVIDIA GTX 1070 Ti 8GB.
- Digital PDFs with text layers are the v1 priority.
samples/is local fixture context and must not be committed unless explicitly requested.- UI launcher must invoke
pdf2mdoruv run pdf2md; it must not call MinerU directly or bundle the full conversion runtime. - Every substantial implementation chunk needs a sprint contract and independent evaluation.
4. Current Repository Layout
pyproject.toml
README.md
src/
pdf2md/
__init__.py
cli.py
conversion.py
pdf_splitter.py
paths.py
mineru_adapter.py
ir.py
markdown.py
metadata.py
quality.py
report.py
doctor.py
gpu.py
mineru_profile.py
math_render.py
math_repair.py
text_fidelity.py
pdf2md_ui/
__init__.py
app.py
runner.py
tests/
integration/
docs/
Sprints/
superpowers/
Do not scaffold unused modules before a sprint needs them.
5. Active Next Sprint
Status:
- No active implementation sprint.
Next implementation work should start from a new user-approved requirement and, if substantial, a new sprint contract.
6. Abandoned Planning
Sprint 17: Offline Windows Installer
Status:
- Abandoned at the user's request on 2026-05-13.
Historical references:
docs/Sprints/SPRINT17CONTRACT.md.docs/superpowers/plans/2026-05-12-offline-installer.md.
Do not implement or extend Sprint 17 unless the user explicitly reopens offline installer work.
7. Future Decisions
- Decide whether simplified outputs need a metadata-free
pdf2md recheck; currentrecheckremains legacy-only for outputs with adjacent metadata JSON. - Validate
--gpu auto --mineru-profile autoon a stronger NVIDIA GPU PC.
8. Harness Operating Model
Use the project long-running harness only for substantial implementation work.
harness-planner-agentturns the next user request into a sprint contract.evaluation-agentreviews the contract before code changes start.feature-generator-agentimplements one approved contract at a time.feature-generator-agentruns self-checks and records residual risks.evaluation-agentindependently verifies the result against the contract.- The parent agent updates
PROGRESS.md, commits the completed change, and leaves a handoff.
After a chunk is no longer active, archive completed-work details in docs/WORKARCHIVE.md and keep PROGRESS.md focused on current status, blockers, and next actions.
9. Completed Sprint Archive
Completed sprint details have been moved out of this active implementation plan.
- Summary and verification evidence:
docs/WORKARCHIVE.md. - Detailed historical contracts:
docs/Sprints/SPRINT0CONTRACT.mdthroughdocs/Sprints/SPRINT16CONTRACT.md. - UI folder batch design and execution record:
docs/superpowers/specs/2026-05-13-ui-folder-batch-conversion-design.mdanddocs/superpowers/plans/2026-05-13-ui-folder-batch-conversion.md. - Abandoned Sprint 17 planning record:
docs/Sprints/SPRINT17CONTRACT.mdanddocs/superpowers/plans/2026-05-12-offline-installer.md.
Facts carried forward from completed work:
- MinerU is fixed to version 3.1.0.
- Direct local CLI command shape is
mineru -p <input> -o <output>. - Python 3.12 is compatible with the pinned MinerU package range.
- GTX 1070 Ti CUDA/PyTorch support needs explicit doctor validation.
- Formula reconstruction remains best effort and must keep warnings/provenance visible.
- MinerU/model license posture is acceptable for personal local use. Redistribution remains gated by license review.