Add Markdown recheck command

This commit is contained in:
NINI
2026-05-11 01:00:26 +09:00
parent b69c03c206
commit 03927a26a1
9 changed files with 276 additions and 11 deletions
+11 -1
View File
@@ -4,7 +4,7 @@ Local-only PDF-to-Markdown converter for math-heavy digital documents.
## Status
The project currently provides a Python package, `pdf2md convert`, metadata/report output, mocked MinerU adapter tests, `pdf2md doctor` setup diagnostics, and Sprint 9 release-gate documentation. Real local MinerU sample validation remains optional and may be blocked until MinerU 3.1.0 and local model/cache setup are available.
The project currently provides a Python package, `pdf2md convert`, Markdown recheck via `pdf2md recheck`, metadata/report output, mocked MinerU adapter tests, `pdf2md doctor` setup diagnostics, and Sprint 9 release-gate documentation. Real local MinerU sample validation remains optional and may be blocked until MinerU 3.1.0 and local model/cache setup are available.
## Setup
@@ -76,6 +76,16 @@ The model/cache check looks for these environment variables when present:
It also checks for `%USERPROFILE%\mineru.json`, which MinerU documents as its default user config location. Missing model/cache paths are warnings because model download and cache population must be explicit setup actions.
## Rechecking Markdown
After editing a generated Markdown file, rerun local quality checks and regenerate the adjacent metadata/report files:
```powershell
uv run pdf2md recheck outputs/MITC공부/MITC공부.md
```
`recheck` reads the existing `<stem>.metadata.json` for source PDF, engine, page, and asset provenance. It replaces quality warnings that can be recalculated from the current Markdown, including MathJax render failures and local asset-link warnings, then rewrites `<stem>.metadata.json` and `<stem>.report.md`.
## Runtime Policy
Runtime conversion is strict-local. Allowed: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` that MinerU starts when `--api-url` is omitted. Prohibited: `--api-url`, remote APIs, router mode, HTTP client backends, remote OpenAI-compatible backends, hosted renderers, and cloud fallbacks.