Files
2026-05-14 10:16:59 +09:00

1.3 KiB

Evaluation Metrics

Use these metrics for local fixture plans and future tests.

Fixture Categories

  • Simple digital PDF with text layer.
  • Math-heavy paper or chapter.
  • Multi-column paper.
  • Table with formulas.
  • Figure with caption and asset extraction.
  • Korean filename/path handling.

Fast Checks

  • Output files are planned at deterministic paths.
  • Internal provenance includes source PDF, page count, engine, warnings, and output paths.
  • _report.md can be generated from internal provenance without re-running MinerU.
  • Markdown math delimiter normalization is deterministic.
  • Asset links resolve relative to the Markdown file.

Optional MinerU Checks

  • MinerU CLI execution succeeds or produces a clear failure warning.
  • Page coverage equals source PDF page count.
  • Math renderability failures are counted.
  • Table degradation warnings are counted.
  • Reading-order uncertainty is surfaced.

Report Sections

  • Summary: source file, pages, output files, engine, start/end time.
  • Warnings: grouped by severity and code.
  • Math: counts for inline, display, low-confidence, and render failures.
  • Assets: extracted, missing, broken links.
  • Tables: extracted, degraded, fallback count.
  • Environment: Python, uv, MinerU version, GPU visibility when available.