13 KiB
Sprint 6 Contract: Quality Checks And Report Generation
Status: Implemented Last updated: 2026-05-08
Objective
Build local quality-check and human-readable report generation boundaries from project-owned metadata and normalized Markdown, before they are connected to conversion orchestration.
Sprint 6 must establish:
- A project-owned quality module for local asset-link and math-renderability signals.
- A report module that renders
<stem>.report.mdcontent from metadata and quality results. - Deterministic final status calculation:
success,partial, orfailed. - Summary fields needed by reports, including missing asset links and math render failures.
- Fast unit tests that do not require real MinerU, model files, GPU, sample PDFs, Obsidian, LaTeX tooling, network, or a working conversion CLI.
Sprint 6 is a quality/report contract sprint. It may generate report Markdown content as a string, but it must not connect to the CLI, conversion orchestration, real MinerU execution, file output writing, setup scripts, or end-to-end conversion.
Current Precondition
Sprint 5 is complete:
src/pdf2md/paths.pyowns input discovery and output path planning.src/pdf2md/ir.pyowns project records, block types, warning codes, and warning severities.src/pdf2md/metadata.pybuilds JSON-serializable metadata and summary counts from project-owned records.src/pdf2md/mineru_adapter.pyowns the mocked direct local MinerU CLI adapter boundary.src/pdf2md/markdown.pyowns Obsidian Markdown normalization, asset link warnings, and table fallback warnings.uv run pytestpassed 89 tests.
Sprint 6 may use metadata dictionaries produced by build_metadata, project-owned WarningRecord values, and normalized Markdown text. It must not require raw MinerU-specific Python objects as public or required inputs.
Touched Surfaces
Allowed:
src/pdf2md/quality.pysrc/pdf2md/report.pysrc/pdf2md/metadata.pyonly for narrowly required summary fields or helper functions that keep metadata/report consistencysrc/pdf2md/ir.pyonly for narrowly required warning codes discovered while implementing quality checkstests/test_quality.pytests/test_report.pytests/test_metadata.pyonly ifmetadata.pychangesREADME.mdonly if a small note is needed to clarify mocked/local quality and report behaviorPLAN.mdonly for current-goal coordination updates required by the shared agent workflowPROGRESS.mddocs/V1IMPLEMENTATIONPLAN.mdonly if sequencing or constraints need adjustmentdocs/Sprints/SPRINT6CONTRACT.md
Not allowed:
src/pdf2md/conversion.pysrc/pdf2md/cli.pysrc/pdf2md/mineru_adapter.py- Working
pdf2md convertbehavior - Full
pdf2md doctorbehavior scripts/- Any real MinerU invocation in default tests
- Any MinerU/model installation or download script
- Any PDF content parsing
- Any final Markdown file writing
- Any metadata JSON file writing
- Any
.report.mdfile writing as product behavior - Any asset copying or moving
- Any runtime engine selection or alternate engine support
- Any remote asset fetch, HTTP client, cloud/API integration, hosted renderer, or remote math-render service
- Any committed file under
samples/
Expected Outputs
Sprint 6 should produce:
-
Quality result records and API
- A small project-owned quality result type containing at least:
- missing asset link count
- invalid asset link count when available
- math render error count
- warnings produced by quality checks
- A local asset-link check function that accepts normalized Markdown and local asset context without writing files.
- A math renderability check interface that accepts a local checker callable or reports tool-unavailable behavior gracefully.
- No public or required field should expose raw MinerU-specific Python objects.
- A small project-owned quality result type containing at least:
-
Asset-link quality checks
- Count missing local asset links in Markdown.
- Count invalid links that are absolute, parent-escaping, remote, or otherwise non-local according to project policy.
- Produce project-owned warnings for missing or invalid asset links.
- Keep all checks local and deterministic.
- Do not fetch remote URLs, copy assets, move assets, or write files.
-
Math renderability checks
- Provide a boundary for local math renderability checking.
- Default tests must use fake/local checker callables.
- Tool-unavailable behavior must be explicit and non-fatal.
- Render failures must produce
MATH_RENDER_FAILEDwarnings and count toward the report. - The checker must not call network services or require a LaTeX/Obsidian install in default tests.
-
Metadata summary consistency
- Preserve existing required metadata summary fields.
- Add or derive report-needed counts without breaking existing metadata tests:
- missing asset link count
- invalid asset link count
- math render error count
- Warning order and warning counts must remain deterministic.
- Reports must be derived from metadata and quality results, not independently duplicated state.
-
Report Markdown generation
- Render a human-readable
<stem>.report.mdcontent string from metadata and quality results. - Include at least:
- source PDF path
- output Markdown path when provided
- metadata path when provided
- report path when provided
- MinerU engine/version and execution mode/options
- pages processed
- warning count
- asset count
- missing asset link count
- inline formula count
- display formula count
- math render error count
- pages with warnings
- final status:
success,partial, orfailed
- The report must not invent facts that are absent from metadata; absent optional paths should be omitted or clearly shown as unavailable.
- The report generator must not write files in Sprint 6.
- Render a human-readable
-
Final status policy
failed: metadata or quality warnings contain at least oneerrorseverity warning.partial: no error severity warnings, but warnings or quality failures exist.success: no warnings and no quality failures.- The status function must be unit-tested and reusable by later orchestration.
-
Tests
- Unit tests for missing asset link counting.
- Unit tests for invalid/remote/escaping asset link warnings.
- Unit tests for math render failure aggregation with a fake checker.
- Unit tests for math checker unavailable behavior.
- Unit tests for report content and required sections.
- Unit tests proving report content is derived from metadata and quality results.
- Unit tests for pages-with-warnings summary.
- Unit tests for final status calculation.
- Unit tests proving no real MinerU binary, model files, GPU,
samples/, Obsidian, LaTeX install, or network are required by default.
-
Handoff
PROGRESS.mdrecords changed files, commands run, tests passed or blocked, known failures, residual risks, and next action.
Non-Goals
- Do not implement conversion orchestration.
- Do not implement
convert_pdf. - Do not implement
pdf2md convert. - Do not implement full
pdf2md doctor. - Do not invoke MinerU.
- Do not install MinerU 3.1.0.
- Do not download MinerU models.
- Do not parse real PDFs.
- Do not write final Markdown files.
- Do not copy or move assets.
- Do not write metadata JSON files.
- Do not write
.report.mdfiles as product behavior. - Do not compute source SHA-256.
- Do not implement real LaTeX, KaTeX, MathJax, or Obsidian rendering in default tests.
- Do not add setup scripts.
- Do not implement full local environment diagnostics.
- Do not implement alternate engines or runtime engine selection.
- Do not add cloud, remote API, router, HTTP client backend, remote OpenAI-compatible backend, hosted renderer, or remote asset-fetching support.
Work Packages
WP6.1: Quality Result Types And Asset Checks
Owner:
metadata-agentfeature-generator-agent
Actions:
- Define a small project-owned quality result type.
- Add deterministic local asset link checks over normalized Markdown.
- Count missing, invalid, escaping, absolute, and remote asset references.
- Return project-owned warnings without writing files.
Output:
- Later orchestration can add local quality results to metadata/report flow without duplicating asset-link logic.
WP6.2: Math Renderability Boundary
Owner:
obsidian-markdown-agentmetadata-agentfeature-generator-agent
Actions:
- Define a local math render checker interface.
- Support fake checkers in tests.
- Treat checker-unavailable as explicit non-fatal warning/info according to the implementation design.
- Treat render failures as
MATH_RENDER_FAILEDwarnings and count them.
Output:
- Math renderability is represented as a local, testable boundary without external dependencies.
WP6.3: Metadata Summary Extensions
Owner:
metadata-agentfeature-generator-agent
Actions:
- Preserve existing required metadata summary fields.
- Add or derive counts needed by reports in a backward-compatible way.
- Keep metadata JSON serializable and deterministic.
Output:
- Metadata remains the source of truth for report counts and warning summaries.
WP6.4: Report Markdown Rendering
Owner:
metadata-agentfeature-generator-agent
Actions:
- Implement report content rendering from metadata plus quality results.
- Include required report sections and final status.
- Generate content only; do not write files.
Output:
- Later orchestration can write
<stem>.report.mdby using the tested report renderer.
WP6.5: Independent Evaluation
Owner:
evaluation-agent
Actions:
- Review completed quality/report behavior against this contract.
- Verify no conversion orchestration, real MinerU dependency in default tests, remote runtime path, alternate engine, final output writing, CLI behavior, or sample dependency was added.
- Verify
samples/remains untracked and unstaged.
Output:
- PASS/FAIL notes with any missing acceptance criteria.
Verification Checks
Required:
git status --shortbefore staging confirmssamples/remains untracked.uv --versionis run and result is recorded.uv syncpasses.uv run pytestpasses.- Targeted quality/report tests pass.
- Tests do not require real MinerU, CUDA, GPU, model files, Obsidian, LaTeX tooling,
samples/, or network. - No model downloads occur.
- No network calls are required.
- No candidate engine comparison is reintroduced.
- No conversion orchestration is implemented.
- No working
pdf2md convertor fullpdf2md doctorbehavior is implemented. - No final Markdown, metadata JSON, or
.report.mdfiles are written as product behavior. - No remote asset fetching is implemented.
- No real math renderer dependency is required by default tests.
- Report counts match metadata and quality results.
- Report generation does not re-run MinerU.
git diff --checkpasses.
Recommended:
- Keep quality helpers pure and deterministic.
- Use fake checkers for math renderability tests.
- Keep report rendering stable enough for snapshot-like unit assertions.
- Use
requirements-guard-agentif warning codes, summary fields, or report wording conflict across documents.
Hard Failure Criteria
Sprint 6 fails and must stop for a user decision if any of these are true:
- Report content diverges from metadata or quality result counts.
- Math render failures are silently ignored.
- Quality checks require network access.
- The implementation fetches remote assets or adds any HTTP/network client path.
- The implementation requires a real LaTeX/Obsidian/MathJax/KaTeX install in default tests.
- The implementation connects quality/report behavior to a working conversion CLI/API.
- The implementation writes final Markdown, metadata JSON,
.report.md, or copied assets as product behavior. - The implementation invokes MinerU, downloads models, adds setup scripts, or parses real PDFs.
- Default tests require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/. samples/is staged or committed.
Acceptance Criteria
Sprint 6 is complete when:
src/pdf2md/quality.pyexists and owns local quality-check behavior.src/pdf2md/report.pyexists and owns human-readable report content rendering.- Missing asset link counting is unit-tested.
- Invalid, escaping, absolute, or remote asset link warning behavior is unit-tested.
- Math render failure aggregation is unit-tested with fake checkers.
- Math checker unavailable behavior is unit-tested and non-fatal.
- Report content includes the required sections and counts.
- Pages-with-warnings summary is unit-tested.
- Final status calculation is unit-tested.
- Report generation is proven not to write files or re-run MinerU.
- Default tests do not require MinerU, GPU, model files, network, Obsidian, LaTeX tooling, or
samples/. - No conversion orchestration, final output file writing, working CLI behavior, real MinerU execution, or setup script is implemented.
uv syncpasses.uv run pytestpasses.PROGRESS.mdrecords checks performed and residual risks.- Independent evaluation is complete.
- The completed change is committed.
Handoff Fields
Use these fields when Sprint 6 completes:
- Files changed:
- Commands run:
- Tests passed:
- Tests blocked:
- Known failures:
- Residual risks:
- User decisions needed:
- Go/no-go recommendation for Sprint 7:
- Next action: