modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
+3 -3
View File
@@ -1,13 +1,13 @@
# Knowledge Base: Local PDF-to-Markdown Converter for Math-Heavy Documents
Last updated: 2026-05-07
Last updated: 2026-05-11
## 1. Product Direction
This project will build a local-first PDF-to-Markdown converter for math-heavy academic PDFs and books. The v1 target is intentionally narrow:
- Processing policy: local-only. Do not send user PDFs to cloud OCR or external AI APIs.
- Primary interface: CLI plus Python library.
- Primary interface: CLI plus Python library. A later thin local desktop launcher may wrap the CLI, but it must not become a separate conversion pipeline.
- Primary output: Obsidian-friendly Markdown.
- Main conversion engine: MinerU 3.1.0.
- Math output: inline math as `$...$`, display math as `$$...$$`.
@@ -73,7 +73,7 @@ Rules:
- Inline math: `$...$`.
- Display math: `$$...$$` on separate lines.
- Store extracted images in a sibling assets directory, for example `paper.assets/page-003-figure-01.png`.
- Store extracted images in the PDF output folder's shared `images/` directory, for example `paper/images/page-003_figure-01.png`.
- Use relative links from the Markdown file to assets.
- Preserve page boundaries in metadata, not by noisy visible page markers in the main Markdown.
- Prefer normal Markdown tables for simple tables.