91 lines
7.3 KiB
Markdown
91 lines
7.3 KiB
Markdown
# PLAN.md
|
|
|
|
This file is the shared work plan for agents. Read it before starting work, then update it when the plan changes.
|
|
|
|
## Current Goal
|
|
|
|
Completed work history is archived in `docs/WORKARCHIVE.md`. CUDA-enabled PyTorch and MinerU 3.1.0 runtime setup is complete in the project `.venv`; Sprint 10 pre-conversion PDF chunking is implemented; next work is optional real local sample validation only if requested.
|
|
|
|
## Active Constraints
|
|
|
|
- Do not implement additional program code beyond the active user-approved sprint.
|
|
- Keep MinerU 3.1.0 as the only conversion engine.
|
|
- Keep processing local-only.
|
|
- Target Python 3.12.
|
|
- Target GPU: GTX 1070 Ti 8GB.
|
|
- Default conversion device: `cuda:0`.
|
|
- Run MinerU through direct local CLI execution only.
|
|
- On MinerU failure, report a clear error/warning and do not silently fallback.
|
|
- Write both metadata JSON and a human-readable `.report.md` quality report for conversions.
|
|
- Use `samples/` only as local fixture context; do not commit sample files unless explicitly requested.
|
|
|
|
## Planned Work
|
|
|
|
1. Use `research-agent` for MinerU 3.1.0 source tracking and official-doc verification.
|
|
2. Use `requirements-guard-agent` for cross-document consistency reviews.
|
|
3. Use `mineru-integration-agent` for direct local MinerU CLI adapter planning.
|
|
4. Use `obsidian-markdown-agent` for math-heavy Obsidian Markdown output planning.
|
|
5. Use `metadata-agent` for provenance, warning, JSON metadata, and `.report.md` planning.
|
|
6. Use `evaluation-agent` for local fixture coverage and regression criteria.
|
|
7. Use `local-setup-agent` for Python 3.12, uv, CUDA, GTX 1070 Ti 8GB, and doctor-check planning.
|
|
8. Use `license-privacy-agent` for license and strict-local privacy review.
|
|
9. Use `harness-planner-agent` to turn substantial implementation requests into scoped contracts before code work starts.
|
|
10. Use `feature-generator-agent` to implement one approved contract at a time after the user explicitly requests implementation.
|
|
11. Use `evaluation-agent` as the independent contract reviewer and QA evaluator before and after each implementation chunk.
|
|
12. Follow `docs/V1IMPLEMENTATIONPLAN.md` for the v1 implementation sprint sequence.
|
|
13. Use `docs/Sprints/SPRINT10CONTRACT.md` for the implemented long-PDF pre-conversion chunking sprint.
|
|
14. Use `docs/WORKARCHIVE.md` for completed sprint history, prior verification, runtime setup evidence, and sample conversion evidence.
|
|
|
|
## Open Questions
|
|
|
|
- None.
|
|
|
|
## Decisions
|
|
|
|
- Use `PLAN.md` for intended work and ownership.
|
|
- Use `PROGRESS.md` for completed work, current status, blockers, and next actions.
|
|
- Use `docs/WORKARCHIVE.md` for archived completed work and historical handoff details.
|
|
- MinerU default local CLI execution is the only v1 execution mode.
|
|
- MinerU 3.1.0 may launch a temporary local `mineru-api` internally when `mineru` CLI runs without `--api-url`.
|
|
- Strict-local mode forbids `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
|
|
- No silent fallback after MinerU failure.
|
|
- Conversion output includes both metadata JSON and `<stem>.report.md`.
|
|
- Local MathJax render checking is optional and nonfatal; missing Node.js or MathJax must produce a clear warning instead of blocking conversion.
|
|
- Project-scoped custom agents live in `.codex/agents/*.toml`.
|
|
- Project prompt commands live in `.codex/commands/*.md`.
|
|
- Project-specific skills live in `.codex/skills/*/SKILL.md`.
|
|
- Project hooks live in `.codex/hooks.json` and `.codex/hooks/*.py`.
|
|
- Agent, command, skill, and hook assets are written in English for Codex compatibility.
|
|
- Long-running implementation should use a planner/generator/evaluator harness only when the task complexity justifies the overhead.
|
|
- Each substantial implementation chunk should have a sprint contract with objective, scope, verification, failure thresholds, and handoff fields.
|
|
- Generator agents may self-check, but independent evaluation is required before marking a chunk complete.
|
|
- V1 implementation sequencing and sprint contracts live in `docs/V1IMPLEMENTATIONPLAN.md`.
|
|
- Concrete sprint contract documents live under `docs/Sprints/`.
|
|
- Sprint 2 path planning contract lives at `docs/Sprints/SPRINT2CONTRACT.md`.
|
|
- Sprint 3 domain records and metadata contract lives at `docs/Sprints/SPRINT3CONTRACT.md`.
|
|
- Sprint 4 MinerU adapter contract lives at `docs/Sprints/SPRINT4CONTRACT.md`.
|
|
- Sprint 4 fixes the v1 adapter executable to the direct `mineru` CLI; user-specified alternate executables, including `mineru-api`, are prohibited.
|
|
- Sprint 5 Obsidian Markdown normalization and asset link contract lives at `docs/Sprints/SPRINT5CONTRACT.md`.
|
|
- Sprint 5 owns Markdown normalization only; it does not write final Markdown files, copy assets, run MinerU, or connect to conversion orchestration.
|
|
- Sprint 6 quality checks and report generation contract lives at `docs/Sprints/SPRINT6CONTRACT.md`.
|
|
- Sprint 6 owns quality/report boundaries only; it does not write final files, run MinerU, or connect to conversion orchestration.
|
|
- Sprint 7 conversion orchestration, CLI, and Python API contract lives at `docs/Sprints/SPRINT7CONTRACT.md`.
|
|
- Sprint 7 will be the first implementation sprint allowed to write final Markdown, metadata JSON, report Markdown, and local copied assets as product behavior.
|
|
- Sprint 7 implemented conversion orchestration, `convert_pdf`, batch conversion, `pdf2md convert`, output writing, metadata/report writing, and fake-adapter CLI/API tests.
|
|
- Sprint 8 should cover `pdf2md doctor` and setup documentation; Sprint 7 intentionally did not add doctor behavior.
|
|
- Sprint 8 doctor and setup documentation contract lives at `docs/Sprints/SPRINT8CONTRACT.md`.
|
|
- Sprint 8 owns doctor diagnostics and setup docs only; it must not run real MinerU, download models, run sample PDFs, or add runtime remote/API paths in default tests.
|
|
- Sprint 8 implements `pdf2md doctor`, local setup diagnostics, and setup documentation without running real MinerU, downloading models, or touching `samples/` in default tests.
|
|
- Sprint 9 local fixture evaluation and v1 release gate contract lives at `docs/Sprints/SPRINT9CONTRACT.md`.
|
|
- Sprint 9 must keep default tests independent of real MinerU, GPU, models, network, Obsidian, LaTeX tooling, and `samples/`; real MinerU fixture checks must be explicit opt-in only.
|
|
- Sprint 9 implements fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and `docs/V1RELEASECHECKLIST.md`.
|
|
- `pdf2md convert` defaults to `--gpu cuda:0`.
|
|
- The MinerU adapter maps CUDA device requests to local subprocess environment variables instead of adding speculative MinerU CLI flags.
|
|
- GTX 1070 Ti local runtime uses PyTorch `2.6.0+cu126` and `torchvision 0.21.0+cu126` installed after `uv sync`, followed by `mineru[core]==3.1.0`.
|
|
- MinerU models are downloaded with `mineru-models-download -s huggingface -m all`, and runtime model loading uses `MINERU_MODEL_SOURCE=local`.
|
|
- Sprint 10 uses `pypdf` for local PDF page chunk planning and temporary chunk PDF writing.
|
|
- Sprint 10 converts chunk PDFs independently and does not merge generated Markdown outputs.
|
|
- Chunking is opt-in through `--chunk-pages`; if the option is present without a value, the CLI uses 20 pages per chunk.
|
|
- `convert_pdf()` keeps returning `ConversionResult` without chunking and returns `BatchConversionResult` when `chunk_pages` is set.
|
|
- Chunk PDFs are temporary local files and are deleted after conversion completes, including when raw MinerU output is retained.
|