# Sprint 6 Contract: Quality Checks And Report Generation Status: Implemented Last updated: 2026-05-08 ## Objective Build local quality-check and human-readable report generation boundaries from project-owned metadata and normalized Markdown, before they are connected to conversion orchestration. Sprint 6 must establish: - A project-owned quality module for local asset-link and math-renderability signals. - A report module that renders `.report.md` content from metadata and quality results. - Deterministic final status calculation: `success`, `partial`, or `failed`. - Summary fields needed by reports, including missing asset links and math render failures. - Fast unit tests that do not require real MinerU, model files, GPU, sample PDFs, Obsidian, LaTeX tooling, network, or a working conversion CLI. Sprint 6 is a quality/report contract sprint. It may generate report Markdown content as a string, but it must not connect to the CLI, conversion orchestration, real MinerU execution, file output writing, setup scripts, or end-to-end conversion. ## Current Precondition Sprint 5 is complete: - `src/pdf2md/paths.py` owns input discovery and output path planning. - `src/pdf2md/ir.py` owns project records, block types, warning codes, and warning severities. - `src/pdf2md/metadata.py` builds JSON-serializable metadata and summary counts from project-owned records. - `src/pdf2md/mineru_adapter.py` owns the mocked direct local MinerU CLI adapter boundary. - `src/pdf2md/markdown.py` owns Obsidian Markdown normalization, asset link warnings, and table fallback warnings. - `uv run pytest` passed 89 tests. Sprint 6 may use metadata dictionaries produced by `build_metadata`, project-owned `WarningRecord` values, and normalized Markdown text. It must not require raw MinerU-specific Python objects as public or required inputs. ## Touched Surfaces Allowed: - `src/pdf2md/quality.py` - `src/pdf2md/report.py` - `src/pdf2md/metadata.py` only for narrowly required summary fields or helper functions that keep metadata/report consistency - `src/pdf2md/ir.py` only for narrowly required warning codes discovered while implementing quality checks - `tests/test_quality.py` - `tests/test_report.py` - `tests/test_metadata.py` only if `metadata.py` changes - `README.md` only if a small note is needed to clarify mocked/local quality and report behavior - `PLAN.md` only for current-goal coordination updates required by the shared agent workflow - `PROGRESS.md` - `docs/V1IMPLEMENTATIONPLAN.md` only if sequencing or constraints need adjustment - `docs/Sprints/SPRINT6CONTRACT.md` Not allowed: - `src/pdf2md/conversion.py` - `src/pdf2md/cli.py` - `src/pdf2md/mineru_adapter.py` - Working `pdf2md convert` behavior - Full `pdf2md doctor` behavior - `scripts/` - Any real MinerU invocation in default tests - Any MinerU/model installation or download script - Any PDF content parsing - Any final Markdown file writing - Any metadata JSON file writing - Any `.report.md` file writing as product behavior - Any asset copying or moving - Any runtime engine selection or alternate engine support - Any remote asset fetch, HTTP client, cloud/API integration, hosted renderer, or remote math-render service - Any committed file under `samples/` ## Expected Outputs Sprint 6 should produce: 1. Quality result records and API - A small project-owned quality result type containing at least: - missing asset link count - invalid asset link count when available - math render error count - warnings produced by quality checks - A local asset-link check function that accepts normalized Markdown and local asset context without writing files. - A math renderability check interface that accepts a local checker callable or reports tool-unavailable behavior gracefully. - No public or required field should expose raw MinerU-specific Python objects. 2. Asset-link quality checks - Count missing local asset links in Markdown. - Count invalid links that are absolute, parent-escaping, remote, or otherwise non-local according to project policy. - Produce project-owned warnings for missing or invalid asset links. - Keep all checks local and deterministic. - Do not fetch remote URLs, copy assets, move assets, or write files. 3. Math renderability checks - Provide a boundary for local math renderability checking. - Default tests must use fake/local checker callables. - Tool-unavailable behavior must be explicit and non-fatal. - Render failures must produce `MATH_RENDER_FAILED` warnings and count toward the report. - The checker must not call network services or require a LaTeX/Obsidian install in default tests. 4. Metadata summary consistency - Preserve existing required metadata summary fields. - Add or derive report-needed counts without breaking existing metadata tests: - missing asset link count - invalid asset link count - math render error count - Warning order and warning counts must remain deterministic. - Reports must be derived from metadata and quality results, not independently duplicated state. 5. Report Markdown generation - Render a human-readable `.report.md` content string from metadata and quality results. - Include at least: - source PDF path - output Markdown path when provided - metadata path when provided - report path when provided - MinerU engine/version and execution mode/options - pages processed - warning count - asset count - missing asset link count - inline formula count - display formula count - math render error count - pages with warnings - final status: `success`, `partial`, or `failed` - The report must not invent facts that are absent from metadata; absent optional paths should be omitted or clearly shown as unavailable. - The report generator must not write files in Sprint 6. 6. Final status policy - `failed`: metadata or quality warnings contain at least one `error` severity warning. - `partial`: no error severity warnings, but warnings or quality failures exist. - `success`: no warnings and no quality failures. - The status function must be unit-tested and reusable by later orchestration. 7. Tests - Unit tests for missing asset link counting. - Unit tests for invalid/remote/escaping asset link warnings. - Unit tests for math render failure aggregation with a fake checker. - Unit tests for math checker unavailable behavior. - Unit tests for report content and required sections. - Unit tests proving report content is derived from metadata and quality results. - Unit tests for pages-with-warnings summary. - Unit tests for final status calculation. - Unit tests proving no real MinerU binary, model files, GPU, `samples/`, Obsidian, LaTeX install, or network are required by default. 8. Handoff - `PROGRESS.md` records changed files, commands run, tests passed or blocked, known failures, residual risks, and next action. ## Non-Goals - Do not implement conversion orchestration. - Do not implement `convert_pdf`. - Do not implement `pdf2md convert`. - Do not implement full `pdf2md doctor`. - Do not invoke MinerU. - Do not install MinerU 3.1.0. - Do not download MinerU models. - Do not parse real PDFs. - Do not write final Markdown files. - Do not copy or move assets. - Do not write metadata JSON files. - Do not write `.report.md` files as product behavior. - Do not compute source SHA-256. - Do not implement real LaTeX, KaTeX, MathJax, or Obsidian rendering in default tests. - Do not add setup scripts. - Do not implement full local environment diagnostics. - Do not implement alternate engines or runtime engine selection. - Do not add cloud, remote API, router, HTTP client backend, remote OpenAI-compatible backend, hosted renderer, or remote asset-fetching support. ## Work Packages ### WP6.1: Quality Result Types And Asset Checks Owner: - `metadata-agent` - `feature-generator-agent` Actions: - Define a small project-owned quality result type. - Add deterministic local asset link checks over normalized Markdown. - Count missing, invalid, escaping, absolute, and remote asset references. - Return project-owned warnings without writing files. Output: - Later orchestration can add local quality results to metadata/report flow without duplicating asset-link logic. ### WP6.2: Math Renderability Boundary Owner: - `obsidian-markdown-agent` - `metadata-agent` - `feature-generator-agent` Actions: - Define a local math render checker interface. - Support fake checkers in tests. - Treat checker-unavailable as explicit non-fatal warning/info according to the implementation design. - Treat render failures as `MATH_RENDER_FAILED` warnings and count them. Output: - Math renderability is represented as a local, testable boundary without external dependencies. ### WP6.3: Metadata Summary Extensions Owner: - `metadata-agent` - `feature-generator-agent` Actions: - Preserve existing required metadata summary fields. - Add or derive counts needed by reports in a backward-compatible way. - Keep metadata JSON serializable and deterministic. Output: - Metadata remains the source of truth for report counts and warning summaries. ### WP6.4: Report Markdown Rendering Owner: - `metadata-agent` - `feature-generator-agent` Actions: - Implement report content rendering from metadata plus quality results. - Include required report sections and final status. - Generate content only; do not write files. Output: - Later orchestration can write `.report.md` by using the tested report renderer. ### WP6.5: Independent Evaluation Owner: - `evaluation-agent` Actions: - Review completed quality/report behavior against this contract. - Verify no conversion orchestration, real MinerU dependency in default tests, remote runtime path, alternate engine, final output writing, CLI behavior, or sample dependency was added. - Verify `samples/` remains untracked and unstaged. Output: - PASS/FAIL notes with any missing acceptance criteria. ## Verification Checks Required: - `git status --short` before staging confirms `samples/` remains untracked. - `uv --version` is run and result is recorded. - `uv sync` passes. - `uv run pytest` passes. - Targeted quality/report tests pass. - Tests do not require real MinerU, CUDA, GPU, model files, Obsidian, LaTeX tooling, `samples/`, or network. - No model downloads occur. - No network calls are required. - No candidate engine comparison is reintroduced. - No conversion orchestration is implemented. - No working `pdf2md convert` or full `pdf2md doctor` behavior is implemented. - No final Markdown, metadata JSON, or `.report.md` files are written as product behavior. - No remote asset fetching is implemented. - No real math renderer dependency is required by default tests. - Report counts match metadata and quality results. - Report generation does not re-run MinerU. - `git diff --check` passes. Recommended: - Keep quality helpers pure and deterministic. - Use fake checkers for math renderability tests. - Keep report rendering stable enough for snapshot-like unit assertions. - Use `requirements-guard-agent` if warning codes, summary fields, or report wording conflict across documents. ## Hard Failure Criteria Sprint 6 fails and must stop for a user decision if any of these are true: - Report content diverges from metadata or quality result counts. - Math render failures are silently ignored. - Quality checks require network access. - The implementation fetches remote assets or adds any HTTP/network client path. - The implementation requires a real LaTeX/Obsidian/MathJax/KaTeX install in default tests. - The implementation connects quality/report behavior to a working conversion CLI/API. - The implementation writes final Markdown, metadata JSON, `.report.md`, or copied assets as product behavior. - The implementation invokes MinerU, downloads models, adds setup scripts, or parses real PDFs. - Default tests require real MinerU, CUDA, GPU, model files, network, Obsidian, LaTeX tooling, or `samples/`. - `samples/` is staged or committed. ## Acceptance Criteria Sprint 6 is complete when: - `src/pdf2md/quality.py` exists and owns local quality-check behavior. - `src/pdf2md/report.py` exists and owns human-readable report content rendering. - Missing asset link counting is unit-tested. - Invalid, escaping, absolute, or remote asset link warning behavior is unit-tested. - Math render failure aggregation is unit-tested with fake checkers. - Math checker unavailable behavior is unit-tested and non-fatal. - Report content includes the required sections and counts. - Pages-with-warnings summary is unit-tested. - Final status calculation is unit-tested. - Report generation is proven not to write files or re-run MinerU. - Default tests do not require MinerU, GPU, model files, network, Obsidian, LaTeX tooling, or `samples/`. - No conversion orchestration, final output file writing, working CLI behavior, real MinerU execution, or setup script is implemented. - `uv sync` passes. - `uv run pytest` passes. - `PROGRESS.md` records checks performed and residual risks. - Independent evaluation is complete. - The completed change is committed. ## Handoff Fields Use these fields when Sprint 6 completes: - Files changed: - Commands run: - Tests passed: - Tests blocked: - Known failures: - Residual risks: - User decisions needed: - Go/no-go recommendation for Sprint 7: - Next action: