7.6 KiB
Architecture: Local PDF-to-Markdown Converter
Last updated: 2026-05-07
1. Overview
The system converts math-heavy digital PDFs into Obsidian-friendly Markdown using MinerU 3.1.0 as the fixed local conversion engine. Product requirements live in PRD.md; agent workflow rules live in AGENTS.md; research notes live in docs/KNOWLEDGEBASE.md.
The architecture separates MinerU execution from project-owned normalization and metadata. This boundary exists only to isolate MinerU I/O; it is not a pluggable engine system.
2. System Layers
-
CLI/API layer
- Parse command arguments and Python API parameters.
- Discover input PDFs.
- Plan output paths.
- Enforce overwrite behavior.
- Print conversion summaries.
-
MinerU adapter layer
- Validate MinerU 3.1.0 installation and version.
- Run MinerU through direct local CLI execution.
- Capture raw Markdown, structured output, assets, logs, and exit status.
- Enforce strict-local execution.
-
Intermediate representation layer
- Convert MinerU-specific output into project-owned document/page/block objects.
- Preserve page index, bbox, confidence, source engine, and asset references returned by MinerU.
- Prevent raw MinerU objects from becoming public API return types.
-
Normalization layer
- Convert project-owned objects and MinerU Markdown into Obsidian-friendly Markdown.
- Normalize math delimiters, display math spacing, headings, tables, and asset links.
-
Quality and metadata layer
- Run link checks and math renderability checks with local tooling.
- Aggregate structured warnings.
- Write metadata JSON, quality report Markdown, and optional raw MinerU diagnostics.
3. Conversion Pipeline
-
Input discovery
- Accept a single PDF or a directory.
- Require
--recursivefor subdirectory traversal. - Validate that each selected input is a local PDF.
- Compute source SHA-256.
-
MinerU conversion
- Create an isolated work directory per input PDF.
- Run the MinerU 3.1.0 adapter through the direct
mineruCLI. - Capture raw Markdown, raw JSON/structured output when available, extracted assets, warnings, and logs.
-
Intermediate representation
- Build document/page/block records from MinerU output.
- Preserve provenance data instead of relying only on final Markdown text.
-
Obsidian normalization
- Normalize inline math to
$...$. - Normalize display math to
$$...$$blocks on separate lines. - Normalize image links to stable relative asset paths.
- Normalize tables without destroying complex table structure.
- Normalize inline math to
-
Quality checks
- Verify generated asset links.
- Check math renderability when local tooling is available.
- Emit warnings without stopping conversion unless no usable output can be produced.
-
Output writing
- Write final Markdown.
- Write extracted assets.
- Write metadata JSON.
- Write
<stem>.report.md. - Keep raw MinerU output when requested.
4. MinerU Adapter Contract
The MinerU adapter exposes:
nameis_available()version()doctor()convert(input_pdf, work_dir, options)
Adapter conversion output contains:
raw_markdownraw_structuredassetspageswarningsengineengine_versionengine_optionsexit_codestderr
The adapter must fail fast if it cannot run in strict-local mode. Runtime engine selection is not part of v1.
The default conversion device is cuda:0. Because MinerU 3.1.0 selects its local device through environment/config rather than a dedicated CLI GPU flag, the adapter must set the MinerU subprocess environment to request CUDA by default while keeping the command shape direct and local.
Allowed MinerU execution in v1:
- Direct local
mineruCLI execution. - The temporary local
mineru-apiprocess that MinerU 3.1.0 starts internally when the CLI runs without--api-url.
Prohibited MinerU execution in v1:
- Passing
--api-url. - Remote APIs.
- Router mode.
- HTTP client backends.
- Remote OpenAI-compatible backends or inference endpoints.
5. Intermediate Representation
The project uses a small internal representation for normalization and metadata.
Required concepts:
- Document
- Page
- Block
- Asset
- Warning
Required block types:
headingparagraphinline_formuladisplay_formulatablefigurecaptionfootnotereferenceunknown
Record these page/block fields when MinerU returns them. Do not invent missing values.
- Page index.
- Page dimensions.
- Bounding boxes.
- Confidence.
- Source engine, fixed to MinerU 3.1.0 in v1.
- Markdown character span.
6. Markdown Normalization
Final Markdown must prioritize Obsidian.
- Use
$...$for inline math. - Use display math blocks with
$$on their own lines. - Keep blank lines around display math.
- Do not escape underscores or carets inside math unnecessarily.
- Prefer Markdown tables for simple tables.
- Use HTML tables for complex tables when Markdown would lose structure.
- Store figures/images in a stable relative assets directory.
- Do not add visible page separators in v1.
- Preserve captions and references when MinerU provides them.
7. Metadata Schema
When metadata is enabled, write <stem>.metadata.json.
Required top-level fields:
source_pdfsource_sha256created_atengineengine_versionengine_optionspagesassetswarningssummary
Required summary fields:
pages_processedwarning_countasset_countdisplay_formula_countinline_formula_countmath_render_error_count
Warning records include:
codeseveritypage_indexbboxmessage
Stable warning code examples:
ENGINE_MISSINGGPU_UNAVAILABLELOW_CONFIDENCE_FORMULAMATH_RENDER_FAILEDASSET_LINK_MISSINGREADING_ORDER_UNCERTAINSTRICT_LOCAL_VIOLATIONMINERU_CLI_FAILED
8. Quality Report
Every conversion writes <stem>.report.md.
The report is derived from metadata and local quality checks. It contains:
- Source and output paths.
- MinerU version and execution mode.
- Pages processed.
- Warning count.
- Asset count and missing asset link count.
- Inline and display formula counts.
- Math render error count.
- Pages with warnings.
- Final status:
success,partial, orfailed.
9. Local-Only Enforcement
The implementation must not upload PDFs, page images, or extracted text to remote services.
Strict-local mode is on by default. The MinerU adapter must not call cloud OCR APIs, hosted document parsing APIs, hosted LLM/VLM APIs, or remote model inference endpoints.
Local-only execution means direct mineru CLI execution in v1. MinerU 3.1.0's CLI-internal temporary local mineru-api process is allowed because it is local orchestration owned by the CLI invocation. User-specified API URLs, router mode, HTTP client backends, remote APIs, and remote OpenAI-compatible backends are not allowed.
Allowed network activity is limited to documentation, package/model installation initiated by setup commands, or tests explicitly marked as network tests and disabled by default.
10. Failure Policy
Conversion should continue automatically when possible.
- Low-confidence formulas are included as best effort and recorded as warnings.
- Low-confidence pages are included as best effort and recorded as warnings.
- Run MinerU with its default local CLI behavior first.
- If MinerU cannot run or returns a failure, report the failure clearly and do not silently switch backend.
- Conversion fails only when the input cannot be opened, MinerU cannot run, no usable output can be produced, output cannot be written, or strict-local policy is violated.
- CLI summaries must report warning counts clearly.