Files
PDFToMD/docs/V1IMPLEMENTATIONPLAN.md
T
2026-05-14 10:16:59 +09:00

6.5 KiB

V1 Implementation Plan: Local PDF-to-Markdown Converter

Last updated: 2026-05-13

This document tracks the current v1 implementation state and open future decisions. It does not replace PRD.md or ARCHITECTURE.md; use those files as the source of product requirements and system design. Completed sprint details are archived in docs/WORKARCHIVE.md, and detailed acceptance criteria remain in docs/Sprints/*.md.

1. Current V1 State

The core v1 converter is implemented through Sprint 16. The implemented system includes:

  • Python 3.12 package and pdf2md CLI.
  • Direct local MinerU 3.1.0 CLI adapter with strict-local enforcement.
  • Obsidian-friendly Markdown normalization.
  • Internal provenance, structured warnings, quality checks, and one human-readable report.
  • pdf2md doctor.
  • Optional grouped page conversion through --chunk-pages.
  • Local MathJax render checking and conservative failed-span repair.
  • pypdf-based text fidelity diagnostics.
  • NVIDIA GPU inventory, --gpu auto, and --mineru-profile auto|safe|performance.
  • Simplified output layout: <out>/<stem>/<stem>_001.md, shared <out>/<stem>/images/, and <out>/<stem>/<stem>_report.md.
  • No public metadata JSON for new conversions.
  • Minimal Windows UI launcher over the existing CLI, including direct-folder PDF batch conversion through sequential pdf2md convert subprocesses.

Historical implementation evidence, verification commands, and sample conversion results are in docs/WORKARCHIVE.md.

2. V1 Outcome

v1 is complete when a local user can run:

uv run pdf2md doctor
uv run pdf2md convert paper.pdf --out out
uv run pdf2md convert pdfs --out out --recursive

and receive, for each PDF:

  • Obsidian-friendly Markdown parts under <out>/<stem>/<stem>_001.md, <stem>_002.md, and so on.
  • A stable shared image/media directory under <out>/<stem>/images/.
  • One human-readable report under <out>/<stem>/<stem>_report.md.
  • No persisted metadata JSON for new conversions.
  • Clear warnings when math, tables, assets, reading order, text fidelity, GPU availability, or MinerU execution are uncertain.

Long PDFs can be chunked explicitly:

uv run pdf2md convert paper.pdf --out out --chunk-pages
uv run pdf2md convert paper.pdf --out out --chunk-pages 20

When --chunk-pages is active, MinerU receives one-page temporary PDFs and final Markdown files are grouped by the configured page count. Temporary one-page PDFs and intermediate per-page outputs are deleted.

The Windows UI launcher is a convenience wrapper over pdf2md; it is not a separate conversion pipeline. UI folder batch conversion runs direct-child PDFs sequentially through the same CLI conversion path.

3. Non-Negotiable Constraints

  • Python 3.12 and uv.
  • MinerU 3.1.0 is the only conversion engine.
  • Direct local MinerU CLI execution only.
  • MinerU 3.1.0 may launch a temporary local mineru-api internally when CLI runs without --api-url.
  • No cloud OCR, hosted LLM/VLM, remote document parser, --api-url, remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends.
  • Target hardware: NVIDIA GTX 1070 Ti 8GB.
  • Digital PDFs with text layers are the v1 priority.
  • samples/ is local fixture context and must not be committed unless explicitly requested.
  • UI launcher must invoke pdf2md or uv run pdf2md; it must not call MinerU directly or bundle the full conversion runtime.
  • Every substantial implementation chunk needs a sprint contract and independent evaluation.

4. Current Repository Layout

pyproject.toml
README.md
src/
  pdf2md/
    __init__.py
    cli.py
    conversion.py
    pdf_splitter.py
    paths.py
    mineru_adapter.py
    ir.py
    markdown.py
    metadata.py
    quality.py
    report.py
    doctor.py
    gpu.py
    mineru_profile.py
    math_render.py
    math_repair.py
    text_fidelity.py
  pdf2md_ui/
    __init__.py
    app.py
    runner.py
tests/
  integration/
docs/
  Sprints/
  superpowers/

Do not scaffold unused modules before a sprint needs them.

5. Active Next Sprint

Status:

  • No active implementation sprint.

Next implementation work should start from a new user-approved requirement and, if substantial, a new sprint contract.

6. Abandoned Planning

Sprint 17: Offline Windows Installer

Status:

  • Abandoned at the user's request on 2026-05-13.

Historical references:

  • docs/Sprints/SPRINT17CONTRACT.md.
  • docs/superpowers/plans/2026-05-12-offline-installer.md.

Do not implement or extend Sprint 17 unless the user explicitly reopens offline installer work.

7. Future Decisions

  • Decide whether simplified outputs need a metadata-free pdf2md recheck; current recheck remains legacy-only for outputs with adjacent metadata JSON.
  • Validate --gpu auto --mineru-profile auto on a stronger NVIDIA GPU PC.

8. Harness Operating Model

Use the project long-running harness only for substantial implementation work.

  1. harness-planner-agent turns the next user request into a sprint contract.
  2. evaluation-agent reviews the contract before code changes start.
  3. feature-generator-agent implements one approved contract at a time.
  4. feature-generator-agent runs self-checks and records residual risks.
  5. evaluation-agent independently verifies the result against the contract.
  6. The parent agent updates PROGRESS.md, commits the completed change, and leaves a handoff.

After a chunk is no longer active, archive completed-work details in docs/WORKARCHIVE.md and keep PROGRESS.md focused on current status, blockers, and next actions.

9. Completed Sprint Archive

Completed sprint details have been moved out of this active implementation plan.

  • Summary and verification evidence: docs/WORKARCHIVE.md.
  • Detailed historical contracts: docs/Sprints/SPRINT0CONTRACT.md through docs/Sprints/SPRINT16CONTRACT.md.
  • UI folder batch design and execution record: docs/superpowers/specs/2026-05-13-ui-folder-batch-conversion-design.md and docs/superpowers/plans/2026-05-13-ui-folder-batch-conversion.md.
  • Abandoned Sprint 17 planning record: docs/Sprints/SPRINT17CONTRACT.md and docs/superpowers/plans/2026-05-12-offline-installer.md.

Facts carried forward from completed work:

  • MinerU is fixed to version 3.1.0.
  • Direct local CLI command shape is mineru -p <input> -o <output>.
  • Python 3.12 is compatible with the pinned MinerU package range.
  • GTX 1070 Ti CUDA/PyTorch support needs explicit doctor validation.
  • Formula reconstruction remains best effort and must keep warnings/provenance visible.
  • MinerU/model license posture is acceptable for personal local use. Redistribution remains gated by license review.