feat: mitigate MathJax formula warnings
This commit is contained in:
@@ -4,7 +4,7 @@ This file is the shared work plan for agents. Read it before starting work, then
|
||||
|
||||
## Current Goal
|
||||
|
||||
Completed work history is archived in `docs/WORKARCHIVE.md`. Sprint 10 pre-conversion PDF chunking is implemented. On this PC, full local runtime setup is complete in `.venv`; Markdown quality recheck for existing outputs is implemented. Next planned work is MathJax warning mitigation: after local MathJax validation, conservatively clean only warning-causing math spans, rerun validation, and preserve provenance for changed or still-failing formulas. Manual Obsidian quality review and sample validation remain optional fallback tasks.
|
||||
Completed work history is archived in `docs/WORKARCHIVE.md`. Sprint 11 MathJax warning mitigation is implemented. On this PC, full local runtime setup is complete in `.venv`; Markdown quality recheck for existing outputs is implemented and now shares the same conservative MathJax repair path as fresh conversion. Next work is optional manual Obsidian quality review, additional sample validation, or broader repair rules if future samples expose new deterministic MathJax failure patterns.
|
||||
|
||||
## Active Constraints
|
||||
|
||||
@@ -35,15 +35,14 @@ Completed work history is archived in `docs/WORKARCHIVE.md`. Sprint 10 pre-conve
|
||||
12. Follow `docs/V1IMPLEMENTATIONPLAN.md` for the v1 implementation sprint sequence.
|
||||
13. Use `docs/Sprints/SPRINT10CONTRACT.md` for the implemented long-PDF pre-conversion chunking sprint.
|
||||
14. Use `docs/WORKARCHIVE.md` for completed sprint history, prior verification, runtime setup evidence, and sample conversion evidence.
|
||||
15. Plan Sprint 11 for MathJax warning mitigation before code changes start.
|
||||
16. Create `docs/Sprints/SPRINT11CONTRACT.md` for the mitigation sprint if implementation is requested.
|
||||
17. Keep the mitigation path shared by `pdf2md convert` and `pdf2md recheck` so existing Markdown outputs can be cleaned without rerunning MinerU.
|
||||
15. Use `docs/Sprints/SPRINT11CONTRACT.md` for the implemented MathJax warning mitigation sprint.
|
||||
16. Keep the mitigation path shared by `pdf2md convert` and `pdf2md recheck` so existing Markdown outputs can be cleaned without rerunning MinerU.
|
||||
|
||||
## Proposed Sprint 11: MathJax Warning Mitigation
|
||||
## Sprint 11: MathJax Warning Mitigation
|
||||
|
||||
Objective:
|
||||
|
||||
- Add a conservative local post-validation cleanup pass that attempts to remove only the specific math-span artifacts responsible for MathJax warnings, then reruns MathJax validation before writing final Markdown, metadata JSON, and report Markdown.
|
||||
- Implemented a conservative local post-validation cleanup pass that attempts to remove only the specific math-span artifacts responsible for MathJax warnings, then reruns MathJax validation before writing final Markdown, metadata JSON, and report Markdown.
|
||||
|
||||
Assumptions:
|
||||
|
||||
@@ -97,8 +96,7 @@ Hard failure criteria:
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Which exact cleanup rules should Sprint 11 allow after inspecting current MathJax failure messages? Recommendation: start with deterministic non-semantic artifacts only.
|
||||
- Should applied mitigations use a new stable warning/info code or be represented through existing metadata/report fields? Recommendation: make repair provenance visible without counting a successfully repaired expression as a render failure.
|
||||
- None.
|
||||
|
||||
## Decisions
|
||||
|
||||
@@ -116,6 +114,8 @@ Hard failure criteria:
|
||||
- Candidate math cleanup must be revalidated with the local MathJax checker before replacing Markdown.
|
||||
- If no candidate passes validation, keep the original formula and retain the `MATH_RENDER_FAILED` warning.
|
||||
- Successfully mitigated formulas must remain traceable in metadata/report output; warning reduction must not hide that a formula was changed.
|
||||
- Sprint 11 uses `MATH_RENDER_REPAIRED` info warnings for applied repair provenance.
|
||||
- Sprint 11 initial repair rules cover repeated same-direction scripts and truncated array `\end{a}` endings only.
|
||||
- Project-scoped custom agents live in `.codex/agents/*.toml`.
|
||||
- Project prompt commands live in `.codex/commands/*.md`.
|
||||
- Project-specific skills live in `.codex/skills/*/SKILL.md`.
|
||||
|
||||
Reference in New Issue
Block a user