feat: mitigate MathJax formula warnings
This commit is contained in:
+9
-9
@@ -6,9 +6,9 @@ This file records current progress for agents. Read it before starting work, the
|
||||
|
||||
- Project direction is documented in `PRD.md`, `ARCHITECTURE.md`, `AGENTS.md`, and `docs/KNOWLEDGEBASE.md`.
|
||||
- MinerU 3.1.0 is fixed as the only conversion engine.
|
||||
- The converter currently includes path planning, project-owned records, metadata, direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report rendering, conversion orchestration, `pdf2md convert`, `pdf2md recheck`, `pdf2md doctor`, local MathJax render checking, release-gate tests, and opt-in pre-conversion PDF chunking.
|
||||
- The converter currently includes path planning, project-owned records, metadata, direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report rendering, conversion orchestration, `pdf2md convert`, `pdf2md recheck`, `pdf2md doctor`, local MathJax render checking, conservative MathJax warning mitigation, release-gate tests, and opt-in pre-conversion PDF chunking.
|
||||
- `docs/V1IMPLEMENTATIONPLAN.md` defines the v1 implementation sequence.
|
||||
- `docs/Sprints/` contains completed sprint contracts through Sprint 10.
|
||||
- `docs/Sprints/` contains completed sprint contracts through Sprint 11.
|
||||
- `docs/WORKARCHIVE.md` contains completed sprint history, historical verification results, runtime setup notes, and sample conversion evidence.
|
||||
- `samples/` exists locally as fixture context.
|
||||
- `outputs/` is ignored and contains local generated conversion outputs.
|
||||
@@ -48,7 +48,9 @@ This file records current progress for agents. Read it before starting work, the
|
||||
- Added `recheck_markdown()` and `pdf2md recheck <markdown.md>` to rerun local quality checks for an existing generated Markdown file and rewrite the adjacent metadata JSON and `.report.md` without rerunning MinerU.
|
||||
- Verified `uv run pdf2md recheck outputs\MITC공부\MITC공부.md`; the command regenerated metadata/report and still reported 2 warnings because the current Markdown still contains the two MathJax-invalid expressions.
|
||||
- Reconverted `samples/MITC공부.pdf` with `--overwrite` to ignored `outputs/MITC공부/`; report status remains `partial`: 13 pages, 107 assets, 23 inline formulas, 103 display formulas, 2 MathJax render warnings, and 0 missing or invalid asset links.
|
||||
- Added a `PLAN.md` Sprint 11 proposal for conservative MathJax warning mitigation after validation; no implementation code has been started.
|
||||
- Sprint 11 implemented conservative MathJax warning mitigation with failed-expression details, `src/pdf2md/math_repair.py`, shared `convert`/`recheck` repair integration, and `MATH_RENDER_REPAIRED` info warnings.
|
||||
- Verified default fast suite: `uv run pytest` passed 172 tests with 1 skipped.
|
||||
- Verified requested real sample: `uv run pdf2md convert samples\MITC공부.pdf --out outputs\sprint11-MITC공부 --overwrite` succeeded with 13 pages, 107 assets, 23 inline formulas, 103 display formulas, 0 MathJax render errors, and 2 `MATH_RENDER_REPAIRED` info warnings.
|
||||
|
||||
## In Progress
|
||||
|
||||
@@ -60,9 +62,7 @@ This file records current progress for agents. Read it before starting work, the
|
||||
|
||||
## Next Actions
|
||||
|
||||
1. If implementation is requested, write `docs/Sprints/SPRINT11CONTRACT.md` for MathJax warning mitigation before code changes start.
|
||||
2. Inspect the current MathJax failure messages from `outputs/MITC공부/MITC공부.md` to choose the narrow initial cleanup rule set.
|
||||
3. Manually fix the two MathJax-invalid expressions in `outputs/MITC공부/MITC공부.md` only if a warning-free local report is desired before Sprint 11 exists, then run `uv run pdf2md recheck outputs\MITC공부\MITC공부.md`.
|
||||
4. Review generated sample Markdown outputs in Obsidian if visual quality needs manual assessment.
|
||||
5. Run optional real local chunked conversion on a long sample only if requested.
|
||||
6. Preserve strict-local runtime behavior: use local model paths, direct CLI execution, and no user-specified API or remote backend.
|
||||
1. Review generated sample Markdown outputs in Obsidian if visual quality needs manual assessment.
|
||||
2. Run additional real local sample validation only if requested, especially for new MathJax failure messages not covered by Sprint 11's narrow repair rules.
|
||||
3. Run optional real local chunked conversion on a long sample only if requested.
|
||||
4. Preserve strict-local runtime behavior: use local model paths, direct CLI execution, and no user-specified API or remote backend.
|
||||
|
||||
Reference in New Issue
Block a user