feat: mitigate MathJax formula warnings
This commit is contained in:
@@ -4,7 +4,7 @@ Last updated: 2026-05-08
|
||||
|
||||
This document is the implementation plan for v1. It does not replace `PRD.md` or `ARCHITECTURE.md`; use those files as the source of product requirements and system design. This plan explains the order of work, sprint contracts, verification gates, and agent ownership for implementing the converter.
|
||||
|
||||
Sprint 1 created the Python package scaffold and CLI placeholder. Sprint 2 created path planning. Sprint 3 created project-owned records and metadata construction. Sprint 4 created the mocked direct local MinerU adapter boundary. Sprint 5 created the Obsidian Markdown normalization boundary. Sprint 6 created local quality-check and report-rendering boundaries. Sprint 7 implemented conversion orchestration, the public conversion API, and the `pdf2md convert` CLI path with fake-adapter tests. Sprint 8 implemented mockable doctor diagnostics, the `pdf2md doctor` CLI path, and setup documentation. Sprint 9 implemented fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and the v1 release checklist. Sprint 10 implemented opt-in pre-conversion PDF chunking for long documents.
|
||||
Sprint 1 created the Python package scaffold and CLI placeholder. Sprint 2 created path planning. Sprint 3 created project-owned records and metadata construction. Sprint 4 created the mocked direct local MinerU adapter boundary. Sprint 5 created the Obsidian Markdown normalization boundary. Sprint 6 created local quality-check and report-rendering boundaries. Sprint 7 implemented conversion orchestration, the public conversion API, and the `pdf2md convert` CLI path with fake-adapter tests. Sprint 8 implemented mockable doctor diagnostics, the `pdf2md doctor` CLI path, and setup documentation. Sprint 9 implemented fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and the v1 release checklist. Sprint 10 implemented opt-in pre-conversion PDF chunking for long documents. Sprint 11 implemented conservative MathJax warning mitigation for failed math spans.
|
||||
|
||||
## 1. V1 Outcome
|
||||
|
||||
@@ -599,6 +599,48 @@ Hard failure criteria:
|
||||
- Chunk outputs are merged.
|
||||
- Default tests require real MinerU, GPU, model files, network, Obsidian, LaTeX tooling, or `samples/`.
|
||||
|
||||
### Sprint 11: MathJax Warning Mitigation
|
||||
|
||||
Active contract:
|
||||
|
||||
- `docs/Sprints/SPRINT11CONTRACT.md`
|
||||
|
||||
Status:
|
||||
|
||||
- Implemented.
|
||||
|
||||
Objective:
|
||||
|
||||
- Repair narrow MathJax-invalid formula artifacts after initial local validation and before final output writing.
|
||||
|
||||
Touched surfaces:
|
||||
|
||||
- `quality.py`
|
||||
- `math_repair.py`
|
||||
- `conversion.py`
|
||||
- `ir.py`
|
||||
- Unit tests for quality details, repair rules, conversion, and recheck behavior
|
||||
|
||||
Expected outputs:
|
||||
|
||||
- Failed math expression records expose body, display mode, span, and checker message.
|
||||
- Repair candidates are generated only for failed math spans.
|
||||
- Repeated same-direction scripts are disambiguated with an empty group.
|
||||
- Truncated `\end{a}` array endings are repaired when array environments are unbalanced.
|
||||
- `convert` and `recheck` share the same repair behavior.
|
||||
- Applied repairs are recorded as `MATH_RENDER_REPAIRED` info warnings and do not count as math render errors.
|
||||
|
||||
Verification checks:
|
||||
|
||||
- Default fast tests pass without real MinerU, GPU, Node.js, MathJax, network, Obsidian, or `samples/`.
|
||||
- `samples/MITC공부.pdf` validates locally with `Math render error count: 0`.
|
||||
|
||||
Hard failure criteria:
|
||||
|
||||
- Repair changes math spans that did not fail local MathJax validation.
|
||||
- Repair claims success without candidate revalidation.
|
||||
- Repair introduces remote services, alternate engines, or mandatory sample-dependent default tests.
|
||||
|
||||
## 6. Cross-Cutting Acceptance Criteria
|
||||
|
||||
Every implementation sprint must preserve these acceptance criteria:
|
||||
@@ -645,7 +687,7 @@ Handoff fields:
|
||||
- MinerU 3.1.0 install and CLI behavior are source-verified, but real local output still needs a later local probe before release.
|
||||
- GTX 1070 Ti 8GB is visible locally, but it is Pascal compute capability 6.1; `doctor` and setup docs must make CUDA/PyTorch limits clear.
|
||||
- `uv` is installed per-user at `C:\Users\user\.local\bin`, but a new shell may need PATH refresh before `uv` is visible.
|
||||
- Formula renderability checks need a local tool choice; the implementation should start with an interface and graceful unavailable-tool warning if needed.
|
||||
- Formula renderability checks and conservative warning mitigation are implemented, but formula reconstruction remains best effort and should keep warnings/provenance visible.
|
||||
- Some PDFs will have tables or formulas that cannot be faithfully represented in Markdown; metadata and `.report.md` must surface this instead of hiding it.
|
||||
- Redistribution license obligations must be reviewed before packaging, redistribution, or bundling model weights.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user