Files
PDFToMD/PLAN.md
T
2026-05-10 00:59:51 +09:00

91 lines
7.3 KiB
Markdown

# PLAN.md
This file is the shared work plan for agents. Read it before starting work, then update it when the plan changes.
## Current Goal
Completed work history is archived in `docs/WORKARCHIVE.md`. Sprint 10 pre-conversion PDF chunking is implemented. On this PC, base uv/Python project setup is complete in `.venv`; next runtime work is local MinerU/PyTorch/model/MathJax setup if requested.
## Active Constraints
- Do not implement additional program code beyond the active user-approved sprint.
- Keep MinerU 3.1.0 as the only conversion engine.
- Keep processing local-only.
- Target Python 3.12.
- Target GPU: GTX 1070 Ti 8GB.
- Default conversion device: `cuda:0`.
- Run MinerU through direct local CLI execution only.
- On MinerU failure, report a clear error/warning and do not silently fallback.
- Write both metadata JSON and a human-readable `.report.md` quality report for conversions.
- Use `samples/` only as local fixture context; do not commit sample files unless explicitly requested.
## Planned Work
1. Use `research-agent` for MinerU 3.1.0 source tracking and official-doc verification.
2. Use `requirements-guard-agent` for cross-document consistency reviews.
3. Use `mineru-integration-agent` for direct local MinerU CLI adapter planning.
4. Use `obsidian-markdown-agent` for math-heavy Obsidian Markdown output planning.
5. Use `metadata-agent` for provenance, warning, JSON metadata, and `.report.md` planning.
6. Use `evaluation-agent` for local fixture coverage and regression criteria.
7. Use `local-setup-agent` for Python 3.12, uv, CUDA, GTX 1070 Ti 8GB, and doctor-check planning.
8. Use `license-privacy-agent` for license and strict-local privacy review.
9. Use `harness-planner-agent` to turn substantial implementation requests into scoped contracts before code work starts.
10. Use `feature-generator-agent` to implement one approved contract at a time after the user explicitly requests implementation.
11. Use `evaluation-agent` as the independent contract reviewer and QA evaluator before and after each implementation chunk.
12. Follow `docs/V1IMPLEMENTATIONPLAN.md` for the v1 implementation sprint sequence.
13. Use `docs/Sprints/SPRINT10CONTRACT.md` for the implemented long-PDF pre-conversion chunking sprint.
14. Use `docs/WORKARCHIVE.md` for completed sprint history, prior verification, runtime setup evidence, and sample conversion evidence.
## Open Questions
- None.
## Decisions
- Use `PLAN.md` for intended work and ownership.
- Use `PROGRESS.md` for completed work, current status, blockers, and next actions.
- Use `docs/WORKARCHIVE.md` for archived completed work and historical handoff details.
- MinerU default local CLI execution is the only v1 execution mode.
- MinerU 3.1.0 may launch a temporary local `mineru-api` internally when `mineru` CLI runs without `--api-url`.
- Strict-local mode forbids `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
- No silent fallback after MinerU failure.
- Conversion output includes both metadata JSON and `<stem>.report.md`.
- Local MathJax render checking is optional and nonfatal; missing Node.js or MathJax must produce a clear warning instead of blocking conversion.
- Project-scoped custom agents live in `.codex/agents/*.toml`.
- Project prompt commands live in `.codex/commands/*.md`.
- Project-specific skills live in `.codex/skills/*/SKILL.md`.
- Project hooks live in `.codex/hooks.json` and `.codex/hooks/*.py`.
- Agent, command, skill, and hook assets are written in English for Codex compatibility.
- Long-running implementation should use a planner/generator/evaluator harness only when the task complexity justifies the overhead.
- Each substantial implementation chunk should have a sprint contract with objective, scope, verification, failure thresholds, and handoff fields.
- Generator agents may self-check, but independent evaluation is required before marking a chunk complete.
- V1 implementation sequencing and sprint contracts live in `docs/V1IMPLEMENTATIONPLAN.md`.
- Concrete sprint contract documents live under `docs/Sprints/`.
- Sprint 2 path planning contract lives at `docs/Sprints/SPRINT2CONTRACT.md`.
- Sprint 3 domain records and metadata contract lives at `docs/Sprints/SPRINT3CONTRACT.md`.
- Sprint 4 MinerU adapter contract lives at `docs/Sprints/SPRINT4CONTRACT.md`.
- Sprint 4 fixes the v1 adapter executable to the direct `mineru` CLI; user-specified alternate executables, including `mineru-api`, are prohibited.
- Sprint 5 Obsidian Markdown normalization and asset link contract lives at `docs/Sprints/SPRINT5CONTRACT.md`.
- Sprint 5 owns Markdown normalization only; it does not write final Markdown files, copy assets, run MinerU, or connect to conversion orchestration.
- Sprint 6 quality checks and report generation contract lives at `docs/Sprints/SPRINT6CONTRACT.md`.
- Sprint 6 owns quality/report boundaries only; it does not write final files, run MinerU, or connect to conversion orchestration.
- Sprint 7 conversion orchestration, CLI, and Python API contract lives at `docs/Sprints/SPRINT7CONTRACT.md`.
- Sprint 7 will be the first implementation sprint allowed to write final Markdown, metadata JSON, report Markdown, and local copied assets as product behavior.
- Sprint 7 implemented conversion orchestration, `convert_pdf`, batch conversion, `pdf2md convert`, output writing, metadata/report writing, and fake-adapter CLI/API tests.
- Sprint 8 should cover `pdf2md doctor` and setup documentation; Sprint 7 intentionally did not add doctor behavior.
- Sprint 8 doctor and setup documentation contract lives at `docs/Sprints/SPRINT8CONTRACT.md`.
- Sprint 8 owns doctor diagnostics and setup docs only; it must not run real MinerU, download models, run sample PDFs, or add runtime remote/API paths in default tests.
- Sprint 8 implements `pdf2md doctor`, local setup diagnostics, and setup documentation without running real MinerU, downloading models, or touching `samples/` in default tests.
- Sprint 9 local fixture evaluation and v1 release gate contract lives at `docs/Sprints/SPRINT9CONTRACT.md`.
- Sprint 9 must keep default tests independent of real MinerU, GPU, models, network, Obsidian, LaTeX tooling, and `samples/`; real MinerU fixture checks must be explicit opt-in only.
- Sprint 9 implements fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and `docs/V1RELEASECHECKLIST.md`.
- `pdf2md convert` defaults to `--gpu cuda:0`.
- The MinerU adapter maps CUDA device requests to local subprocess environment variables instead of adding speculative MinerU CLI flags.
- GTX 1070 Ti local runtime uses PyTorch `2.6.0+cu126` and `torchvision 0.21.0+cu126` installed after `uv sync`, followed by `mineru[core]==3.1.0`.
- MinerU models are downloaded with `mineru-models-download -s huggingface -m all`, and runtime model loading uses `MINERU_MODEL_SOURCE=local`.
- Sprint 10 uses `pypdf` for local PDF page chunk planning and temporary chunk PDF writing.
- Sprint 10 converts chunk PDFs independently and does not merge generated Markdown outputs.
- Chunking is opt-in through `--chunk-pages`; if the option is present without a value, the CLI uses 20 pages per chunk.
- `convert_pdf()` keeps returning `ConversionResult` without chunking and returns `BatchConversionResult` when `chunk_pages` is set.
- Chunk PDFs are temporary local files and are deleted after conversion completes, including when raw MinerU output is retained.