11 KiB
PRD: Local PDF-to-Markdown Converter
Last updated: 2026-05-13
1. Summary
Build a local-only CLI and Python library that converts math-heavy digital PDFs into Obsidian-friendly Markdown. The product prioritizes accurate LaTeX reconstruction for equations, preservation of document structure, stable asset links, and traceable page-level provenance in the human-readable report.
The first version is for personal/research use, targets NVIDIA GPU machines, and uses MinerU 3.1.0 as the fixed conversion engine. It should process digital PDFs with existing text layers first. Scanned books, cloud OCR APIs, hosted web apps, and manual review workflows are out of scope for v1. A thin local Windows desktop launcher exists as a convenience wrapper over the existing pdf2md CLI.
2. Goals
- Convert a single PDF into a PDF-stem output folder containing Markdown part files, shared assets, and one human-readable quality report.
- Convert a folder of PDFs in batch mode.
- Allow the thin local Windows UI launcher to convert direct-child PDFs in a selected folder by sequentially invoking existing CLI commands.
- Preserve inline math as
$...$and display math as$$...$$. - Produce Markdown that opens cleanly in Obsidian.
- Use MinerU 3.1.0 locally.
- Keep enough internal provenance to diagnose formula, layout, and reading-order errors through warnings and the report.
- Continue conversion automatically when a page or formula is low-confidence, while logging warnings.
3. Non-Goals
- No cloud OCR, cloud LLM, or third-party document upload in v1.
- No hosted web app, manual review UI, or alternate GUI conversion pipeline in v1.
- A thin local desktop launcher is allowed only when it invokes the existing
pdf2mdCLI and preserves strict-local behavior. - No manual review queue in v1.
- No optimization for low-quality scanned books in v1.
- No guaranteed perfect LaTeX reconstruction.
- No multi-user server or hosted API in v1.
- No commercial redistribution assumptions until dependency licenses are reviewed.
4. Target Users
Primary user:
- A researcher, student, or developer converting math-heavy papers/books into Obsidian notes.
5. Input Scope
Supported in v1:
- Local
.pdffiles. - Directories containing
.pdffiles. - Digital PDFs with embedded text layers.
- Academic papers with sections, references, figures, captions, tables, inline math, and display equations.
Best effort in v1:
- Multi-column academic layouts.
- Tables containing math.
- Figures and captions.
- Page numbers, headers, and footers.
Out of scope for v1 optimization:
- Poor-quality scans.
- Handwritten math.
- Camera photos.
- Password-protected PDFs.
- Damaged PDFs that cannot be opened by local tooling.
6. Output Scope
For each input PDF, the converter writes:
- One or more normalized Markdown part files named
<stem>_001.md,<stem>_002.md, and so on. - A shared
images/directory when MinerU extracts images or other media. - A human-readable quality report named
<stem>_report.md. - Optional raw MinerU outputs for debugging.
Markdown rules:
- Inline equations use
$...$. - Display equations use
$$...$$on separate lines. - Simple tables use Markdown pipe tables.
- Complex tables may use HTML when Markdown would lose structure.
- Images use relative links to
images/...under the PDF output folder. - Visible page markers should be avoided by default; grouped page conversion may use invisible HTML comments for page provenance.
- Obsidian compatibility is the output standard.
Detailed Markdown normalization rules are defined in ARCHITECTURE.md.
7. CLI Requirements
The CLI binary should be named pdf2md.
Required commands:
pdf2md convert INPUT --out OUTPUT_DIR
pdf2md doctor
convert behavior:
- If
INPUTis a PDF, convert that file. - If
INPUTis a directory, convert PDFs in that directory. - Directory conversion requires
--recursiveto descend into subdirectories. - Output folders default to
<output>/<stem>/. - Markdown part filenames default to
<stem>_001.md,<stem>_002.md, and so on. - Asset directories default to
<output>/<stem>/images/. - Existing outputs are not overwritten unless
--overwriteis passed.
Required convert options:
--out PATH: output directory.--metadata: accepted for compatibility; no metadata JSON is written in the simplified output layout.--keep-raw: keep raw MinerU output for debugging.--recursive: recursively process directory inputs.--overwrite: replace existing outputs.--gpu DEVICE: select CUDA device. Default:cuda:0;autoselects the visible NVIDIA GPU with the most VRAM.--mineru-profile {auto,safe,performance}: select MinerU runtime tuning. Default:auto.--strict-local: forbid remote network/cloud execution during conversion. Default: true.
doctor behavior:
- Report Python version.
- Report
uvavailability. - Report CUDA/PyTorch GPU availability when detectable.
- Report visible NVIDIA GPU index, VRAM, driver version,
--gpu autorecommendation, and recommended MinerU profile. - Report MinerU availability.
- Report local model/cache paths when detectable.
- Warn if no NVIDIA GPU is available.
- Fail if required v1 runtime dependencies are missing.
UI launcher behavior:
- The UI is a local convenience wrapper over the existing
pdf2mdCLI. - The UI may convert a selected folder by discovering direct-child PDFs only and running one
pdf2md convertcommand per PDF sequentially. - The UI must not invoke MinerU directly, add recursive folder conversion outside the existing CLI behavior, run conversions in parallel by default, or expose remote/API options.
8. Python Library Requirements
The library should expose a stable API suitable for scripts and tests.
Required high-level API:
from pdf2md import convert_pdf
result = convert_pdf(
input_path="paper.pdf",
output_dir="out",
metadata=True,
)
Required return fields:
markdown_pathmetadata_path, which isNonefor new simplified outputs.assets_dirwarningsenginepages_processed
The public API should not expose raw MinerU objects as required return types. MinerU-specific data may be stored in internal report/provenance structures.
9. Provenance Requirements
New conversions must not write a public metadata JSON sidecar. Internal metadata-like records may still be built in memory to derive reports, warnings, counts, and ConversionResult fields.
Required top-level fields:
source_pdfsource_sha256created_atengineengine_versionengine_optionspagesassetswarningssummary
Required summary fields:
pages_processedwarning_countasset_countdisplay_formula_countinline_formula_countmath_render_error_count
Warnings must be non-fatal unless the source file cannot be read or no output can be produced.
Detailed internal provenance fields, block types, and warning codes are defined in ARCHITECTURE.md.
10. Quality Report Requirements
For every conversion, write <stem>/<stem>_report.md.
The report must be readable as the primary human-facing quality artifact and include:
- Source PDF path.
- Output folder path and Markdown part paths.
- MinerU version.
- Page count.
- Warning count.
- Asset count.
- Inline formula count.
- Display formula count.
- Math render error count.
- Missing asset link count.
- A short list of pages with warnings.
11. Quality Policy
The product is fully automatic in v1.
- Low-confidence formulas are included in the output as best effort.
- Low-confidence pages are included in the output as best effort.
- The converter logs warnings and internal provenance records.
- Conversion uses MinerU's default local CLI execution. If MinerU cannot run or fails, the converter must emit a clear error/warning instead of silently falling back to another backend.
- Conversion fails only when the input cannot be opened, MinerU cannot run, output cannot be written, no usable output can be produced, or local-only policy is violated.
The CLI summary must report warning counts clearly.
12. Local-Only Policy
The implementation must not upload PDFs or page images to cloud APIs.
Prohibited in v1 runtime:
- Any cloud OCR API.
- Any hosted document parsing API.
- Any remote LLM or VLM call.
- Remote model inference endpoints.
Allowed:
- Local model files.
- Local Python packages.
- Local CLI tools.
- Documentation links.
- Explicit installation downloads initiated by the user during setup.
--strict-local is on by default. The MinerU adapter must not use remote endpoints in strict-local mode.
Allowed in v1 runtime:
- Direct
mineruCLI execution. - The temporary local
mineru-apiprocess that MinerU 3.1.0 starts internally when the CLI runs without--api-url.
Prohibited in v1 runtime:
--api-url.- Remote APIs.
- Router mode.
- HTTP client backends.
- Remote OpenAI-compatible backends or inference endpoints.
Detailed strict-local enforcement rules are defined in ARCHITECTURE.md.
13. Installation Requirements
Use uv as the primary project workflow.
Expected setup commands:
uv sync
uv run pdf2md doctor
MinerU/model setup requires explicit user-initiated local setup commands documented in README.md. Do not reference setup helper scripts unless they actually exist in the repository.
The project should document NVIDIA GPU/CUDA expectations and provide clear errors when GPU acceleration is unavailable.
The default MinerU profile must be conservative on GTX 1070 Ti 8GB and other weak or pre-Turing GPUs. Stronger profile settings are allowed only through local environment tuning on selected 16GB+ Turing-or-newer NVIDIA GPUs.
14. Test Requirements
Required test categories:
- Unit tests for Markdown math delimiter normalization.
- Unit tests for asset path normalization.
- Unit tests for internal metadata/provenance schema creation.
- Unit tests for warning aggregation.
- MinerU adapter contract tests with mocked outputs.
- CLI tests for single PDF, directory input, overwrite behavior, and simplified output layout.
Fixture categories:
- Small digital PDF with simple text and math.
- Math-heavy academic paper page.
- Multi-column paper page.
- Table with formulas.
- Figure with caption.
Acceptance checks:
- Markdown exists after conversion.
- No metadata JSON is written for new conversions.
- Quality report exists after conversion.
- Asset links resolve.
- Inline/display math delimiters match Obsidian expectations.
- Math render checks report failures instead of silently passing.
- No cloud calls are made.
- Warnings do not stop conversion unless MinerU cannot produce output.
- MinerU failure produces a clear error/warning and does not silently switch backend.
15. Release Criteria for v1
v1 is acceptable when:
pdf2md convert paper.pdf --out outworks on a representative digital academic PDF.pdf2md convert pdfs --out out --recursiveworks on a small folder.pdf2md doctorreports MinerU/GPU status clearly.- The default output opens in Obsidian with math blocks rendered.
- The report links pages, warnings, output parts, and assets to the source PDF.
<stem>/<stem>_report.mdsummarizes warnings, formulas, assets, and render/link check results.- The README or setup docs explain local-only behavior and GPU expectations.