modify pdftomd
This commit is contained in:
@@ -1,13 +1,13 @@
|
||||
# Knowledge Base: Local PDF-to-Markdown Converter for Math-Heavy Documents
|
||||
|
||||
Last updated: 2026-05-07
|
||||
Last updated: 2026-05-11
|
||||
|
||||
## 1. Product Direction
|
||||
|
||||
This project will build a local-first PDF-to-Markdown converter for math-heavy academic PDFs and books. The v1 target is intentionally narrow:
|
||||
|
||||
- Processing policy: local-only. Do not send user PDFs to cloud OCR or external AI APIs.
|
||||
- Primary interface: CLI plus Python library.
|
||||
- Primary interface: CLI plus Python library. A later thin local desktop launcher may wrap the CLI, but it must not become a separate conversion pipeline.
|
||||
- Primary output: Obsidian-friendly Markdown.
|
||||
- Main conversion engine: MinerU 3.1.0.
|
||||
- Math output: inline math as `$...$`, display math as `$$...$$`.
|
||||
@@ -73,7 +73,7 @@ Rules:
|
||||
|
||||
- Inline math: `$...$`.
|
||||
- Display math: `$$...$$` on separate lines.
|
||||
- Store extracted images in a sibling assets directory, for example `paper.assets/page-003-figure-01.png`.
|
||||
- Store extracted images in the PDF output folder's shared `images/` directory, for example `paper/images/page-003_figure-01.png`.
|
||||
- Use relative links from the Markdown file to assets.
|
||||
- Preserve page boundaries in metadata, not by noisy visible page markers in the main Markdown.
|
||||
- Prefer normal Markdown tables for simple tables.
|
||||
|
||||
Reference in New Issue
Block a user