Files
PDFToMD/PLAN.md
T
2026-05-08 16:42:19 +09:00

7.1 KiB

PLAN.md

This file is the shared work plan for agents. Read it before starting work, then update it when the plan changes.

Current Goal

CUDA-enabled PyTorch and MinerU 3.1.0 runtime setup is complete in the project .venv. Sprint 10 pre-conversion PDF chunking is implemented; next work is optional real local sample validation only if requested.

Active Constraints

  • Do not implement additional program code beyond the active user-approved sprint.
  • Keep MinerU 3.1.0 as the only conversion engine.
  • Keep processing local-only.
  • Target Python 3.12.
  • Target GPU: GTX 1070 Ti 8GB.
  • Default conversion device: cuda:0.
  • Run MinerU through direct local CLI execution only.
  • On MinerU failure, report a clear error/warning and do not silently fallback.
  • Write both metadata JSON and a human-readable .report.md quality report for conversions.
  • Use samples/ only as local fixture context; do not commit sample files unless explicitly requested.

Planned Work

  1. Use research-agent for MinerU 3.1.0 source tracking and official-doc verification.
  2. Use requirements-guard-agent for cross-document consistency reviews.
  3. Use mineru-integration-agent for direct local MinerU CLI adapter planning.
  4. Use obsidian-markdown-agent for math-heavy Obsidian Markdown output planning.
  5. Use metadata-agent for provenance, warning, JSON metadata, and .report.md planning.
  6. Use evaluation-agent for local fixture coverage and regression criteria.
  7. Use local-setup-agent for Python 3.12, uv, CUDA, GTX 1070 Ti 8GB, and doctor-check planning.
  8. Use license-privacy-agent for license and strict-local privacy review.
  9. Use harness-planner-agent to turn substantial implementation requests into scoped contracts before code work starts.
  10. Use feature-generator-agent to implement one approved contract at a time after the user explicitly requests implementation.
  11. Use evaluation-agent as the independent contract reviewer and QA evaluator before and after each implementation chunk.
  12. Follow docs/V1IMPLEMENTATIONPLAN.md for the v1 implementation sprint sequence.
  13. Use docs/Sprints/SPRINT10CONTRACT.md for the implemented long-PDF pre-conversion chunking sprint.

Open Questions

  • None.

Decisions

  • Use PLAN.md for intended work and ownership.
  • Use PROGRESS.md for completed work, current status, blockers, and next actions.
  • MinerU default local CLI execution is the only v1 execution mode.
  • MinerU 3.1.0 may launch a temporary local mineru-api internally when mineru CLI runs without --api-url.
  • Strict-local mode forbids --api-url, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
  • No silent fallback after MinerU failure.
  • Conversion output includes both metadata JSON and <stem>.report.md.
  • Local MathJax render checking is optional and nonfatal; missing Node.js or MathJax must produce a clear warning instead of blocking conversion.
  • Project-scoped custom agents live in .codex/agents/*.toml.
  • Project prompt commands live in .codex/commands/*.md.
  • Project-specific skills live in .codex/skills/*/SKILL.md.
  • Project hooks live in .codex/hooks.json and .codex/hooks/*.py.
  • Agent, command, skill, and hook assets are written in English for Codex compatibility.
  • Long-running implementation should use a planner/generator/evaluator harness only when the task complexity justifies the overhead.
  • Each substantial implementation chunk should have a sprint contract with objective, scope, verification, failure thresholds, and handoff fields.
  • Generator agents may self-check, but independent evaluation is required before marking a chunk complete.
  • V1 implementation sequencing and sprint contracts live in docs/V1IMPLEMENTATIONPLAN.md.
  • Concrete sprint contract documents live under docs/Sprints/.
  • Sprint 2 path planning contract lives at docs/Sprints/SPRINT2CONTRACT.md.
  • Sprint 3 domain records and metadata contract lives at docs/Sprints/SPRINT3CONTRACT.md.
  • Sprint 4 MinerU adapter contract lives at docs/Sprints/SPRINT4CONTRACT.md.
  • Sprint 4 fixes the v1 adapter executable to the direct mineru CLI; user-specified alternate executables, including mineru-api, are prohibited.
  • Sprint 5 Obsidian Markdown normalization and asset link contract lives at docs/Sprints/SPRINT5CONTRACT.md.
  • Sprint 5 owns Markdown normalization only; it does not write final Markdown files, copy assets, run MinerU, or connect to conversion orchestration.
  • Sprint 6 quality checks and report generation contract lives at docs/Sprints/SPRINT6CONTRACT.md.
  • Sprint 6 owns quality/report boundaries only; it does not write final files, run MinerU, or connect to conversion orchestration.
  • Sprint 7 conversion orchestration, CLI, and Python API contract lives at docs/Sprints/SPRINT7CONTRACT.md.
  • Sprint 7 will be the first implementation sprint allowed to write final Markdown, metadata JSON, report Markdown, and local copied assets as product behavior.
  • Sprint 7 implemented conversion orchestration, convert_pdf, batch conversion, pdf2md convert, output writing, metadata/report writing, and fake-adapter CLI/API tests.
  • Sprint 8 should cover pdf2md doctor and setup documentation; Sprint 7 intentionally did not add doctor behavior.
  • Sprint 8 doctor and setup documentation contract lives at docs/Sprints/SPRINT8CONTRACT.md.
  • Sprint 8 owns doctor diagnostics and setup docs only; it must not run real MinerU, download models, run sample PDFs, or add runtime remote/API paths in default tests.
  • Sprint 8 implements pdf2md doctor, local setup diagnostics, and setup documentation without running real MinerU, downloading models, or touching samples/ in default tests.
  • Sprint 9 local fixture evaluation and v1 release gate contract lives at docs/Sprints/SPRINT9CONTRACT.md.
  • Sprint 9 must keep default tests independent of real MinerU, GPU, models, network, Obsidian, LaTeX tooling, and samples/; real MinerU fixture checks must be explicit opt-in only.
  • Sprint 9 implements fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and docs/V1RELEASECHECKLIST.md.
  • pdf2md convert defaults to --gpu cuda:0.
  • The MinerU adapter maps CUDA device requests to local subprocess environment variables instead of adding speculative MinerU CLI flags.
  • GTX 1070 Ti local runtime uses PyTorch 2.6.0+cu126 and torchvision 0.21.0+cu126 installed after uv sync, followed by mineru[core]==3.1.0.
  • MinerU models are downloaded with mineru-models-download -s huggingface -m all, and runtime model loading uses MINERU_MODEL_SOURCE=local.
  • Sprint 10 should use pypdf for local 20-page PDF chunk creation if implementation is approved.
  • Sprint 10 uses pypdf for local PDF page chunk planning and temporary chunk PDF writing.
  • Sprint 10 converts chunk PDFs independently and does not merge generated Markdown outputs.
  • Chunking is opt-in through --chunk-pages; if the option is present without a value, the CLI uses 20 pages per chunk.
  • convert_pdf() keeps returning ConversionResult without chunking and returns BatchConversionResult when chunk_pages is set.
  • Chunk PDFs are temporary local files and are deleted after conversion completes, including when raw MinerU output is retained.