# Architecture: Local PDF-to-Markdown Converter Last updated: 2026-05-07 ## 1. Overview The system converts math-heavy digital PDFs into Obsidian-friendly Markdown using MinerU 3.1.0 as the fixed local conversion engine. Product requirements live in `PRD.md`; agent workflow rules live in `AGENTS.md`; research notes live in `docs/KNOWLEDGEBASE.md`. The architecture separates MinerU execution from project-owned normalization and metadata. This boundary exists only to isolate MinerU I/O; it is not a pluggable engine system. ## 2. System Layers 1. CLI/API layer - Parse command arguments and Python API parameters. - Discover input PDFs. - Plan output paths. - Enforce overwrite behavior. - Print conversion summaries. 2. MinerU adapter layer - Validate MinerU 3.1.0 installation and version. - Run MinerU through direct local CLI execution. - Capture raw Markdown, structured output, assets, logs, and exit status. - Enforce strict-local execution. 3. Intermediate representation layer - Convert MinerU-specific output into project-owned document/page/block objects. - Preserve page index, bbox, confidence, source engine, and asset references returned by MinerU. - Prevent raw MinerU objects from becoming public API return types. 4. Normalization layer - Convert project-owned objects and MinerU Markdown into Obsidian-friendly Markdown. - Normalize math delimiters, display math spacing, headings, tables, and asset links. 5. Quality and metadata layer - Run link checks and math renderability checks with local tooling. - Aggregate structured warnings. - Write metadata JSON, quality report Markdown, and optional raw MinerU diagnostics. ## 3. Conversion Pipeline 1. Input discovery - Accept a single PDF or a directory. - Require `--recursive` for subdirectory traversal. - Validate that each selected input is a local PDF. - Compute source SHA-256. 2. MinerU conversion - Create an isolated work directory per input PDF. - Run the MinerU 3.1.0 adapter through the direct `mineru` CLI. - Capture raw Markdown, raw JSON/structured output when available, extracted assets, warnings, and logs. 3. Intermediate representation - Build document/page/block records from MinerU output. - Preserve provenance data instead of relying only on final Markdown text. 4. Obsidian normalization - Normalize inline math to `$...$`. - Normalize display math to `$$...$$` blocks on separate lines. - Normalize image links to stable relative asset paths. - Normalize tables without destroying complex table structure. 5. Quality checks - Verify generated asset links. - Check math renderability when local tooling is available. - Emit warnings without stopping conversion unless no usable output can be produced. 6. Output writing - Write final Markdown. - Write extracted assets. - Write metadata JSON. - Write `.report.md`. - Keep raw MinerU output when requested. ## 4. MinerU Adapter Contract The MinerU adapter exposes: - `name` - `is_available()` - `version()` - `doctor()` - `convert(input_pdf, work_dir, options)` Adapter conversion output contains: - `raw_markdown` - `raw_structured` - `assets` - `pages` - `warnings` - `engine` - `engine_version` - `engine_options` - `exit_code` - `stderr` The adapter must fail fast if it cannot run in strict-local mode. Runtime engine selection is not part of v1. The default conversion device is `cuda:0`. Because MinerU 3.1.0 selects its local device through environment/config rather than a dedicated CLI GPU flag, the adapter must set the MinerU subprocess environment to request CUDA by default while keeping the command shape direct and local. Allowed MinerU execution in v1: - Direct local `mineru` CLI execution. - The temporary local `mineru-api` process that MinerU 3.1.0 starts internally when the CLI runs without `--api-url`. Prohibited MinerU execution in v1: - Passing `--api-url`. - Remote APIs. - Router mode. - HTTP client backends. - Remote OpenAI-compatible backends or inference endpoints. ## 5. Intermediate Representation The project uses a small internal representation for normalization and metadata. Required concepts: - Document - Page - Block - Asset - Warning Required block types: - `heading` - `paragraph` - `inline_formula` - `display_formula` - `table` - `figure` - `caption` - `footnote` - `reference` - `unknown` Record these page/block fields when MinerU returns them. Do not invent missing values. - Page index. - Page dimensions. - Bounding boxes. - Confidence. - Source engine, fixed to MinerU 3.1.0 in v1. - Markdown character span. ## 6. Markdown Normalization Final Markdown must prioritize Obsidian. - Use `$...$` for inline math. - Use display math blocks with `$$` on their own lines. - Keep blank lines around display math. - Do not escape underscores or carets inside math unnecessarily. - Prefer Markdown tables for simple tables. - Use HTML tables for complex tables when Markdown would lose structure. - Store figures/images in a stable relative assets directory. - Do not add visible page separators in v1. - Preserve captions and references when MinerU provides them. ## 7. Metadata Schema When metadata is enabled, write `.metadata.json`. Required top-level fields: - `source_pdf` - `source_sha256` - `created_at` - `engine` - `engine_version` - `engine_options` - `pages` - `assets` - `warnings` - `summary` Required summary fields: - `pages_processed` - `warning_count` - `asset_count` - `display_formula_count` - `inline_formula_count` - `math_render_error_count` Warning records include: - `code` - `severity` - `page_index` - `bbox` - `message` Stable warning code examples: - `ENGINE_MISSING` - `GPU_UNAVAILABLE` - `LOW_CONFIDENCE_FORMULA` - `MATH_RENDER_FAILED` - `MATH_RENDER_REPAIRED` - `ASSET_LINK_MISSING` - `READING_ORDER_UNCERTAIN` - `STRICT_LOCAL_VIOLATION` - `MINERU_CLI_FAILED` ## 8. Quality Report Every conversion writes `.report.md`. The report is derived from metadata and local quality checks. It contains: - Source and output paths. - MinerU version and execution mode. - Pages processed. - Warning count. - Asset count and missing asset link count. - Inline and display formula counts. - Math render error count. - Pages with warnings. - Final status: `success`, `partial`, or `failed`. ## 9. Local-Only Enforcement The implementation must not upload PDFs, page images, or extracted text to remote services. Strict-local mode is on by default. The MinerU adapter must not call cloud OCR APIs, hosted document parsing APIs, hosted LLM/VLM APIs, or remote model inference endpoints. Local-only execution means direct `mineru` CLI execution in v1. MinerU 3.1.0's CLI-internal temporary local `mineru-api` process is allowed because it is local orchestration owned by the CLI invocation. User-specified API URLs, router mode, HTTP client backends, remote APIs, and remote OpenAI-compatible backends are not allowed. Allowed network activity is limited to documentation, package/model installation initiated by setup commands, or tests explicitly marked as network tests and disabled by default. ## 10. Failure Policy Conversion should continue automatically when possible. - Low-confidence formulas are included as best effort and recorded as warnings. - Low-confidence pages are included as best effort and recorded as warnings. - Run MinerU with its default local CLI behavior first. - If MinerU cannot run or returns a failure, report the failure clearly and do not silently switch backend. - Conversion fails only when the input cannot be opened, MinerU cannot run, no usable output can be produced, output cannot be written, or strict-local policy is violated. - CLI summaries must report warning counts clearly.