7.1 KiB
7.1 KiB
PLAN.md
This file is the shared work plan for agents. Read it before starting work, then update it when the plan changes.
Current Goal
CUDA-enabled PyTorch and MinerU 3.1.0 runtime setup is complete in the project .venv. Sprint 10 pre-conversion PDF chunking is implemented; next work is optional real local sample validation only if requested.
Active Constraints
- Do not implement additional program code beyond the active user-approved sprint.
- Keep MinerU 3.1.0 as the only conversion engine.
- Keep processing local-only.
- Target Python 3.12.
- Target GPU: GTX 1070 Ti 8GB.
- Default conversion device:
cuda:0. - Run MinerU through direct local CLI execution only.
- On MinerU failure, report a clear error/warning and do not silently fallback.
- Write both metadata JSON and a human-readable
.report.mdquality report for conversions. - Use
samples/only as local fixture context; do not commit sample files unless explicitly requested.
Planned Work
- Use
research-agentfor MinerU 3.1.0 source tracking and official-doc verification. - Use
requirements-guard-agentfor cross-document consistency reviews. - Use
mineru-integration-agentfor direct local MinerU CLI adapter planning. - Use
obsidian-markdown-agentfor math-heavy Obsidian Markdown output planning. - Use
metadata-agentfor provenance, warning, JSON metadata, and.report.mdplanning. - Use
evaluation-agentfor local fixture coverage and regression criteria. - Use
local-setup-agentfor Python 3.12, uv, CUDA, GTX 1070 Ti 8GB, and doctor-check planning. - Use
license-privacy-agentfor license and strict-local privacy review. - Use
harness-planner-agentto turn substantial implementation requests into scoped contracts before code work starts. - Use
feature-generator-agentto implement one approved contract at a time after the user explicitly requests implementation. - Use
evaluation-agentas the independent contract reviewer and QA evaluator before and after each implementation chunk. - Follow
docs/V1IMPLEMENTATIONPLAN.mdfor the v1 implementation sprint sequence. - Use
docs/Sprints/SPRINT10CONTRACT.mdfor the implemented long-PDF pre-conversion chunking sprint.
Open Questions
- None.
Decisions
- Use
PLAN.mdfor intended work and ownership. - Use
PROGRESS.mdfor completed work, current status, blockers, and next actions. - MinerU default local CLI execution is the only v1 execution mode.
- MinerU 3.1.0 may launch a temporary local
mineru-apiinternally whenmineruCLI runs without--api-url. - Strict-local mode forbids
--api-url, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends. - No silent fallback after MinerU failure.
- Conversion output includes both metadata JSON and
<stem>.report.md. - Local MathJax render checking is optional and nonfatal; missing Node.js or MathJax must produce a clear warning instead of blocking conversion.
- Project-scoped custom agents live in
.codex/agents/*.toml. - Project prompt commands live in
.codex/commands/*.md. - Project-specific skills live in
.codex/skills/*/SKILL.md. - Project hooks live in
.codex/hooks.jsonand.codex/hooks/*.py. - Agent, command, skill, and hook assets are written in English for Codex compatibility.
- Long-running implementation should use a planner/generator/evaluator harness only when the task complexity justifies the overhead.
- Each substantial implementation chunk should have a sprint contract with objective, scope, verification, failure thresholds, and handoff fields.
- Generator agents may self-check, but independent evaluation is required before marking a chunk complete.
- V1 implementation sequencing and sprint contracts live in
docs/V1IMPLEMENTATIONPLAN.md. - Concrete sprint contract documents live under
docs/Sprints/. - Sprint 2 path planning contract lives at
docs/Sprints/SPRINT2CONTRACT.md. - Sprint 3 domain records and metadata contract lives at
docs/Sprints/SPRINT3CONTRACT.md. - Sprint 4 MinerU adapter contract lives at
docs/Sprints/SPRINT4CONTRACT.md. - Sprint 4 fixes the v1 adapter executable to the direct
mineruCLI; user-specified alternate executables, includingmineru-api, are prohibited. - Sprint 5 Obsidian Markdown normalization and asset link contract lives at
docs/Sprints/SPRINT5CONTRACT.md. - Sprint 5 owns Markdown normalization only; it does not write final Markdown files, copy assets, run MinerU, or connect to conversion orchestration.
- Sprint 6 quality checks and report generation contract lives at
docs/Sprints/SPRINT6CONTRACT.md. - Sprint 6 owns quality/report boundaries only; it does not write final files, run MinerU, or connect to conversion orchestration.
- Sprint 7 conversion orchestration, CLI, and Python API contract lives at
docs/Sprints/SPRINT7CONTRACT.md. - Sprint 7 will be the first implementation sprint allowed to write final Markdown, metadata JSON, report Markdown, and local copied assets as product behavior.
- Sprint 7 implemented conversion orchestration,
convert_pdf, batch conversion,pdf2md convert, output writing, metadata/report writing, and fake-adapter CLI/API tests. - Sprint 8 should cover
pdf2md doctorand setup documentation; Sprint 7 intentionally did not add doctor behavior. - Sprint 8 doctor and setup documentation contract lives at
docs/Sprints/SPRINT8CONTRACT.md. - Sprint 8 owns doctor diagnostics and setup docs only; it must not run real MinerU, download models, run sample PDFs, or add runtime remote/API paths in default tests.
- Sprint 8 implements
pdf2md doctor, local setup diagnostics, and setup documentation without running real MinerU, downloading models, or touchingsamples/in default tests. - Sprint 9 local fixture evaluation and v1 release gate contract lives at
docs/Sprints/SPRINT9CONTRACT.md. - Sprint 9 must keep default tests independent of real MinerU, GPU, models, network, Obsidian, LaTeX tooling, and
samples/; real MinerU fixture checks must be explicit opt-in only. - Sprint 9 implements fast mocked integration tests, explicit opt-in local MinerU fixture evaluation, and
docs/V1RELEASECHECKLIST.md. pdf2md convertdefaults to--gpu cuda:0.- The MinerU adapter maps CUDA device requests to local subprocess environment variables instead of adding speculative MinerU CLI flags.
- GTX 1070 Ti local runtime uses PyTorch
2.6.0+cu126andtorchvision 0.21.0+cu126installed afteruv sync, followed bymineru[core]==3.1.0. - MinerU models are downloaded with
mineru-models-download -s huggingface -m all, and runtime model loading usesMINERU_MODEL_SOURCE=local. - Sprint 10 should use
pypdffor local 20-page PDF chunk creation if implementation is approved. - Sprint 10 uses
pypdffor local PDF page chunk planning and temporary chunk PDF writing. - Sprint 10 converts chunk PDFs independently and does not merge generated Markdown outputs.
- Chunking is opt-in through
--chunk-pages; if the option is present without a value, the CLI uses 20 pages per chunk. convert_pdf()keeps returningConversionResultwithout chunking and returnsBatchConversionResultwhenchunk_pagesis set.- Chunk PDFs are temporary local files and are deleted after conversion completes, including when raw MinerU output is retained.