feat: mitigate MathJax formula warnings

This commit is contained in:
NINI
2026-05-11 02:08:46 +09:00
parent 005f17bac1
commit 71e6fbcc51
12 changed files with 625 additions and 41 deletions
+9 -9
View File
@@ -6,9 +6,9 @@ This file records current progress for agents. Read it before starting work, the
- Project direction is documented in `PRD.md`, `ARCHITECTURE.md`, `AGENTS.md`, and `docs/KNOWLEDGEBASE.md`.
- MinerU 3.1.0 is fixed as the only conversion engine.
- The converter currently includes path planning, project-owned records, metadata, direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report rendering, conversion orchestration, `pdf2md convert`, `pdf2md recheck`, `pdf2md doctor`, local MathJax render checking, release-gate tests, and opt-in pre-conversion PDF chunking.
- The converter currently includes path planning, project-owned records, metadata, direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report rendering, conversion orchestration, `pdf2md convert`, `pdf2md recheck`, `pdf2md doctor`, local MathJax render checking, conservative MathJax warning mitigation, release-gate tests, and opt-in pre-conversion PDF chunking.
- `docs/V1IMPLEMENTATIONPLAN.md` defines the v1 implementation sequence.
- `docs/Sprints/` contains completed sprint contracts through Sprint 10.
- `docs/Sprints/` contains completed sprint contracts through Sprint 11.
- `docs/WORKARCHIVE.md` contains completed sprint history, historical verification results, runtime setup notes, and sample conversion evidence.
- `samples/` exists locally as fixture context.
- `outputs/` is ignored and contains local generated conversion outputs.
@@ -48,7 +48,9 @@ This file records current progress for agents. Read it before starting work, the
- Added `recheck_markdown()` and `pdf2md recheck <markdown.md>` to rerun local quality checks for an existing generated Markdown file and rewrite the adjacent metadata JSON and `.report.md` without rerunning MinerU.
- Verified `uv run pdf2md recheck outputs\MITC공부\MITC공부.md`; the command regenerated metadata/report and still reported 2 warnings because the current Markdown still contains the two MathJax-invalid expressions.
- Reconverted `samples/MITC공부.pdf` with `--overwrite` to ignored `outputs/MITC공부/`; report status remains `partial`: 13 pages, 107 assets, 23 inline formulas, 103 display formulas, 2 MathJax render warnings, and 0 missing or invalid asset links.
- Added a `PLAN.md` Sprint 11 proposal for conservative MathJax warning mitigation after validation; no implementation code has been started.
- Sprint 11 implemented conservative MathJax warning mitigation with failed-expression details, `src/pdf2md/math_repair.py`, shared `convert`/`recheck` repair integration, and `MATH_RENDER_REPAIRED` info warnings.
- Verified default fast suite: `uv run pytest` passed 172 tests with 1 skipped.
- Verified requested real sample: `uv run pdf2md convert samples\MITC공부.pdf --out outputs\sprint11-MITC공부 --overwrite` succeeded with 13 pages, 107 assets, 23 inline formulas, 103 display formulas, 0 MathJax render errors, and 2 `MATH_RENDER_REPAIRED` info warnings.
## In Progress
@@ -60,9 +62,7 @@ This file records current progress for agents. Read it before starting work, the
## Next Actions
1. If implementation is requested, write `docs/Sprints/SPRINT11CONTRACT.md` for MathJax warning mitigation before code changes start.
2. Inspect the current MathJax failure messages from `outputs/MITC공부/MITC공부.md` to choose the narrow initial cleanup rule set.
3. Manually fix the two MathJax-invalid expressions in `outputs/MITC공부/MITC공부.md` only if a warning-free local report is desired before Sprint 11 exists, then run `uv run pdf2md recheck outputs\MITC공부\MITC공부.md`.
4. Review generated sample Markdown outputs in Obsidian if visual quality needs manual assessment.
5. Run optional real local chunked conversion on a long sample only if requested.
6. Preserve strict-local runtime behavior: use local model paths, direct CLI execution, and no user-specified API or remote backend.
1. Review generated sample Markdown outputs in Obsidian if visual quality needs manual assessment.
2. Run additional real local sample validation only if requested, especially for new MathJax failure messages not covered by Sprint 11's narrow repair rules.
3. Run optional real local chunked conversion on a long sample only if requested.
4. Preserve strict-local runtime behavior: use local model paths, direct CLI execution, and no user-specified API or remote backend.