add pdftomd
This commit is contained in:
+437
@@ -0,0 +1,437 @@
|
||||
# PROGRESS.md
|
||||
|
||||
This file records actual progress for agents. Read it before starting work, then update it after meaningful changes.
|
||||
|
||||
## Current Status
|
||||
|
||||
- Project direction is documented.
|
||||
- MinerU 3.1.0 is fixed as the only conversion engine.
|
||||
- `PRD.md`, `ARCHITECTURE.md`, `AGENTS.md`, and `docs/KNOWLEDGEBASE.md` exist.
|
||||
- `samples/` exists locally and is untracked by git.
|
||||
- Converter implementation exists through Sprint 9 path planning, project-owned records, metadata, mocked direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report content rendering, conversion orchestration, public conversion API, `pdf2md convert`, `pdf2md doctor`, fast mocked integration tests, optional local MinerU fixture evaluation, and the v1 release checklist.
|
||||
- Default conversion now requests `cuda:0`; the MinerU adapter sets local GPU-related environment for the MinerU subprocess.
|
||||
- Project-local Codex workflow assets now live under `.codex/`.
|
||||
- `docs/V1IMPLEMENTATIONPLAN.md` now defines the v1 implementation sequence.
|
||||
- `docs/Sprints/SPRINT0CONTRACT.md` now defines the Sprint 0 contract.
|
||||
- `docs/Sprints/SPRINT1CONTRACT.md` now defines the Sprint 1 scaffold contract.
|
||||
- `docs/Sprints/SPRINT2CONTRACT.md` now defines the Sprint 2 path planning contract.
|
||||
- `docs/Sprints/SPRINT3CONTRACT.md` now defines the Sprint 3 domain records and metadata contract.
|
||||
- `docs/Sprints/SPRINT4CONTRACT.md` now defines the Sprint 4 mocked MinerU adapter contract.
|
||||
- `docs/Sprints/SPRINT5CONTRACT.md` now defines the Sprint 5 Obsidian Markdown normalization and asset link contract.
|
||||
- `docs/Sprints/SPRINT6CONTRACT.md` now defines the Sprint 6 quality checks and report generation contract.
|
||||
- `docs/Sprints/SPRINT7CONTRACT.md` now defines the Sprint 7 conversion orchestration, CLI, and Python API contract.
|
||||
- `docs/Sprints/SPRINT8CONTRACT.md` now defines the Sprint 8 doctor and setup documentation contract.
|
||||
- `docs/Sprints/SPRINT9CONTRACT.md` now defines the Sprint 9 local fixture evaluation and v1 release gate contract.
|
||||
- Relevant `.codex/agents/*.toml` files now reference the v1 plan and sprint contract paths directly.
|
||||
- Sprint 10 is implemented with opt-in pre-conversion PDF chunking, temporary chunk PDF cleanup, chunk metadata/report context, and mocked tests.
|
||||
- Sprint 0 source, environment, license, privacy, and contract verification is complete with a `go-with-risks` recommendation.
|
||||
- Sprint 1 is complete with a minimal Python package, CLI placeholder, and fast pytest loop.
|
||||
- Sprint 4 is implemented with a mock-tested direct local MinerU CLI adapter.
|
||||
- Sprint 5 is implemented with a pure Markdown normalizer and local-only unit tests.
|
||||
- Sprint 6 is implemented with local quality checks and report string rendering.
|
||||
- Sprint 7 is implemented with `convert_pdf`, `convert_input`, output writing, metadata/report writing, local asset copying, batch conversion, and `pdf2md convert`.
|
||||
- Sprint 7 is implemented with fake-adapter CLI/API tests.
|
||||
- Sprint 8 is implemented and committed.
|
||||
- Sprint 9 is implemented, independently evaluated, and committed.
|
||||
- The project `.venv` has been rebuilt with CUDA-enabled PyTorch and MinerU 3.1.0.
|
||||
- Latest `samples/MITC공부.pdf` conversion completed on GPU and wrote Markdown, metadata JSON, report Markdown, and assets under ignored `outputs/MITC공부/`.
|
||||
- `docs/MATHJAXCHECKERPLAN.md` now documents the local MathJax render checker plan and implementation status.
|
||||
- Local MathJax render checker code now exists with optional local Node.js/`mathjax` setup, default conversion integration, and `doctor` diagnostics.
|
||||
- `docs/Sprints/SPRINT10CONTRACT.md` now documents the implemented long-PDF pre-conversion chunking sprint.
|
||||
|
||||
## Environment Notes
|
||||
|
||||
- OS/workspace: Windows PowerShell in `D:\Work\Repos\AICoding\ConvertPDFToMD`.
|
||||
- Python target: 3.12.
|
||||
- Local Python observed during Sprint 0: 3.12.7.
|
||||
- `uv` observed during Sprint 0: not available on PATH.
|
||||
- `uv` installed during Sprint 1: 0.11.11 at `C:\Users\user\.local\bin`.
|
||||
- If a new shell cannot find `uv`, restart the shell or add `C:\Users\user\.local\bin` to PATH.
|
||||
- GPU target: GTX 1070 Ti 8GB.
|
||||
- Local GPU observed during Sprint 0: NVIDIA GeForce GTX 1070 Ti, driver 577.00, 8192 MiB VRAM, WDDM.
|
||||
- Sample PDFs are in `samples/` and include Korean filenames.
|
||||
- MinerU execution mode: direct local CLI only.
|
||||
- MinerU 3.1.0 CLI-internal temporary local `mineru-api` is allowed when CLI runs without `--api-url`.
|
||||
- Strict-local prohibits `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
|
||||
- MinerU planning pin: `mineru[core]==3.1.0` unless Sprint 1 or Sprint 8 proves another 3.1.0 extra is required.
|
||||
- MinerU 3.1.0 was installed in the local `.venv` with `uv pip install "mineru[core]==3.1.0"` for real CLI probing.
|
||||
- Current `pdf2md doctor` status is WARN: MinerU CLI is present, GTX 1070 Ti is visible with Pascal/pre-Turing risk, PyTorch is `2.6.0+cu126` with CUDA available, local MinerU model config is detected, local MathJax checker passes after `npm install`, and strict-local policy passes.
|
||||
- User-level environment variable `MINERU_MODEL_SOURCE=local` is set so MinerU uses the downloaded local model paths in `C:\Users\user\mineru.json`.
|
||||
|
||||
## Completed Work
|
||||
|
||||
- Created initial project documents.
|
||||
- Originally selected MinerU 2.5, then changed the fixed engine target to MinerU 3.1.0 after user approval.
|
||||
- Split architecture details into `ARCHITECTURE.md`.
|
||||
- Aligned documents with `Project Guidelines`.
|
||||
- Added this shared planning/progress workflow.
|
||||
- Decided MinerU failures must produce clear warnings/errors without silent fallback.
|
||||
- Decided every conversion should produce metadata JSON and a human-readable `.report.md`.
|
||||
- Created custom agent specs for research, requirements, MinerU integration, Obsidian Markdown, metadata, evaluation, local setup, and license/privacy work.
|
||||
- Created project prompt commands for startup, MinerU research, document review, integration planning, and quality evaluation planning.
|
||||
- Created project skills for MinerU research, math Markdown review, and fixture evaluation.
|
||||
- Created project hooks for startup context, pre-tool policy checks, and stop-time completion reminders.
|
||||
- Read Anthropic's long-running harness design article and adapted its planner/generator/evaluator pattern for this repository.
|
||||
- Added `harness-planner-agent` and `feature-generator-agent`.
|
||||
- Strengthened `evaluation-agent` as an independent contract reviewer and skeptical QA evaluator.
|
||||
- Added long-running harness workflow guidance to `AGENTS.md`.
|
||||
- Created `docs/V1IMPLEMENTATIONPLAN.md` with v1 sprint sequencing, contracts, verification gates, and agent ownership.
|
||||
- Created `docs/Sprints/SPRINT0CONTRACT.md` for source and environment verification before implementation.
|
||||
- Added direct `docs/V1IMPLEMENTATIONPLAN.md` and `docs/Sprints/SPRINT0CONTRACT.md` references to the agents that need them.
|
||||
- Completed Sprint 0 contract evaluation; result was PASS.
|
||||
- Completed the original Sprint 0 MinerU 2.5.4 package, CLI shape, output layout, model/cache, and strict-local risk verification from primary sources.
|
||||
- Verified local Python, `uv`, and GPU facts using the allowed Sprint 0 commands.
|
||||
- Verified MinerU/model license and privacy posture for personal/research local use versus redistribution.
|
||||
- Updated `docs/KNOWLEDGEBASE.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT0CONTRACT.md` with Sprint 0 findings.
|
||||
- Completed post-output Sprint 0 evaluation. The only missing acceptance item at review time was the final commit.
|
||||
- Redefined strict-local policy for MinerU 3.1.0: allow direct `mineru` CLI and CLI-internal temporary local `mineru-api`; prohibit `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
|
||||
- Updated core project documents and `.codex` workflow assets to reflect MinerU 3.1.0 and the redefined strict-local policy.
|
||||
- Checked MinerU 3.1.0 sources: PyPI 3.1.0 metadata, MinerU release notes, quick usage docs, CLI tools docs, output file docs, and model source docs.
|
||||
- Created `docs/Sprints/SPRINT1CONTRACT.md` for project scaffold and fast test loop planning.
|
||||
- Added direct `docs/Sprints/SPRINT1CONTRACT.md` references to the agents that need Sprint 1 scaffold context.
|
||||
- Started Sprint 1 implementation and amended the contract to include `uv.lock`, which is generated by `uv sync`.
|
||||
- Installed `uv` per-user using the official Astral installer.
|
||||
- Created Sprint 1 scaffold files: `pyproject.toml`, `uv.lock`, `.gitignore`, `README.md`, `src/pdf2md/__init__.py`, `src/pdf2md/cli.py`, `tests/test_package.py`, and `tests/test_cli.py`.
|
||||
- Verified `uv sync` with a temporary project environment outside the repo.
|
||||
- Verified `uv run pytest` passes with 4 tests.
|
||||
- Verified `uv run pdf2md --version` prints `pdf2md 0.1.0`.
|
||||
- Verified `git diff --check` passes.
|
||||
- Checked the scaffold for disallowed MinerU, remote API, router, HTTP, or OpenAI backend references.
|
||||
- Completed independent Sprint 1 evaluation once; the only scope failure was that `PLAN.md` was updated for shared workflow coordination before the Sprint 1 contract listed it as an allowed touched surface.
|
||||
- Amended the Sprint 1 contract to allow minimal `PLAN.md` current-goal coordination updates.
|
||||
- Completed the final independent Sprint 1 evaluation; result was PASS.
|
||||
- Created `docs/Sprints/SPRINT2CONTRACT.md` for paths, input discovery, and overwrite planning.
|
||||
- Added direct `docs/Sprints/SPRINT2CONTRACT.md` references to the agents that need Sprint 2 path planning context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 2 at the new contract and current scaffold state.
|
||||
- Verified the Sprint 2 contract documentation change with `git diff --check` and `uv run pytest` passing 4 tests.
|
||||
- Started Sprint 2 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added `src/pdf2md/paths.py` for input discovery, output path planning, overwrite conflict detection, duplicate output detection, and output-root escape prevention.
|
||||
- Added `tests/test_paths.py` with temporary-file coverage for single PDF discovery, directory discovery, recursive discovery, deterministic ordering, Korean filenames, output path planning, overwrite behavior, duplicate planned outputs, and output-root escape prevention.
|
||||
- Completed independent Sprint 2 evaluation once; the only hard failure was a Windows rooted/drive-relative `relative_parent` escape case.
|
||||
- Fixed output-root escape prevention by rejecting absolute, rooted, drive-qualified, and `..` relative parents and validating resolved planned outputs stay under the output root.
|
||||
- Verified `uv run pytest tests/test_paths.py` passes 17 tests.
|
||||
- Verified `uv run pytest` passes 21 tests.
|
||||
- Verified `git diff --check` passes for the Sprint 2 implementation.
|
||||
- Checked the implementation for disallowed MinerU, remote API, router, HTTP, OpenAI backend, or network client references.
|
||||
- Completed the final independent Sprint 2 evaluation; result was PASS.
|
||||
- Created `docs/Sprints/SPRINT3CONTRACT.md` for domain records, metadata construction, and warning aggregation planning.
|
||||
- Added direct `docs/Sprints/SPRINT3CONTRACT.md` references to the agents that need Sprint 3 metadata context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 3 at the new contract and current path-planning state.
|
||||
- Verified the Sprint 3 contract documentation change with `git diff --check` and `uv run pytest` passing 21 tests.
|
||||
- Started Sprint 3 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added `src/pdf2md/ir.py` for project-owned document, page, block, asset, and warning records with stable block types, warning codes, and severities.
|
||||
- Added `src/pdf2md/metadata.py` for JSON-serializable metadata construction and summary counts from project-owned records.
|
||||
- Added `tests/test_ir.py` and `tests/test_metadata.py` covering record serialization, optional field preservation/omission, invalid enum/severity validation, metadata top-level fields, summary counts, warning order, JSON serializability, and required input validation.
|
||||
- Verified `uv run pytest tests/test_ir.py tests/test_metadata.py` passes 25 tests.
|
||||
- Verified `uv run pytest` passes 46 tests.
|
||||
- Verified `git diff --check` passes for the Sprint 3 implementation.
|
||||
- Checked the implementation for disallowed remote API, router, HTTP, OpenAI backend, network client, MinerU adapter, and doctor references.
|
||||
- Completed the final independent Sprint 3 evaluation; result was PASS.
|
||||
- Created `docs/Sprints/SPRINT4CONTRACT.md` for direct local MinerU CLI adapter boundary planning with mocked subprocess/output tests.
|
||||
- Added direct `docs/Sprints/SPRINT4CONTRACT.md` references to the agents that need Sprint 4 MinerU adapter context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 4 at the new contract and current metadata-model state.
|
||||
- Verified the Sprint 4 contract documentation change with `git diff --check` and `uv run pytest` passing 46 tests.
|
||||
- Started Sprint 4 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added `src/pdf2md/mineru_adapter.py` for the direct local MinerU CLI adapter boundary, mockable availability/version checks, deterministic command construction, subprocess result capture, strict-local option validation, optional mocked-output parsing, and adapter warning mapping.
|
||||
- Added `tests/test_mineru_adapter.py` with fake-runner coverage for availability, missing MinerU, version success/failure/empty output, fixed command shape, custom executable rejection, strict-local rejection, mocked success, non-zero exit, missing output, and invalid JSON.
|
||||
- Fixed an independent evaluation finding that a caller-controlled executable could bypass strict-local policy; v1 now accepts only the direct `mineru` executable name, and user-exposed `mineru-api` execution is rejected.
|
||||
- Verified `uv sync` passes.
|
||||
- Verified `uv run pytest tests/test_mineru_adapter.py` passes 26 tests.
|
||||
- Verified `uv run pytest` passes 72 tests.
|
||||
- Verified `git diff --check` passes for the Sprint 4 implementation.
|
||||
- Checked the implementation for network client imports; none were found.
|
||||
- Checked strict-local prohibited tokens in `src/pdf2md`; matches are limited to deliberate validation literals in `mineru_adapter.py`.
|
||||
- Completed the final independent Sprint 4 evaluation; result was PASS.
|
||||
- Created `docs/Sprints/SPRINT5CONTRACT.md` for Obsidian Markdown normalization, math delimiter handling, asset link normalization, and conservative table fallback planning.
|
||||
- Added direct `docs/Sprints/SPRINT5CONTRACT.md` references to the agents that need Sprint 5 Markdown, warning, implementation, planning, or evaluation context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 5 at the new contract and current Sprint 4 implementation state.
|
||||
- Verified the Sprint 5 contract documentation change with agent TOML parsing, `git diff --check`, and `uv run pytest` passing 72 tests.
|
||||
- Started Sprint 5 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added `src/pdf2md/markdown.py` for project-owned Obsidian Markdown normalization, inline/display math delimiter handling, code fence and inline code protection, relative asset link normalization, local asset warning behavior, and conservative table fallback warnings.
|
||||
- Added `tests/test_markdown.py` covering inline math, display math spacing, idempotency, math body preservation, code protection, asset path normalization, invalid/missing/remote asset warnings, simple table preservation, and complex HTML table fallback warnings.
|
||||
- Added narrow warning codes `ASSET_LINK_INVALID` and `TABLE_FALLBACK` to `src/pdf2md/ir.py`.
|
||||
- Verified `uv sync` passes.
|
||||
- Verified `uv run pytest tests/test_markdown.py tests/test_ir.py` passes 30 tests.
|
||||
- Verified `uv run pytest` passes 89 tests.
|
||||
- Verified `git diff --check` passes for the Sprint 5 implementation.
|
||||
- Checked the implementation for network client imports; none were found.
|
||||
- Checked the implementation for conversion orchestration, metadata writing, report generation, and CLI convert behavior; no Sprint 5 code introduced those paths.
|
||||
- Completed the final independent Sprint 5 evaluation; result was PASS.
|
||||
- Created `docs/Sprints/SPRINT6CONTRACT.md` for local quality checks, math renderability boundary, metadata summary extensions, report content rendering, and final status planning.
|
||||
- Added direct `docs/Sprints/SPRINT6CONTRACT.md` references to the agents that need Sprint 6 quality, reporting, metadata, math renderability, implementation, planning, or evaluation context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 6 at the new contract and current Sprint 5 implementation state.
|
||||
- Verified the Sprint 6 contract documentation change with agent TOML parsing, `git diff --check`, and `uv run pytest` passing 89 tests.
|
||||
- Started Sprint 6 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added `src/pdf2md/quality.py` for local asset-link checks, math renderability checker boundaries, nonfatal checker-unavailable behavior, and quality result aggregation.
|
||||
- Added `src/pdf2md/report.py` for human-readable quality report content rendering from metadata and quality results, pages-with-warnings derivation, and final status calculation.
|
||||
- Added `tests/test_quality.py` covering missing/invalid asset links, code-block exclusions, fake math checker failures, checker-unavailable behavior, and quality result merging.
|
||||
- Added `tests/test_report.py` covering required report content, optional path handling, pages-with-warnings, final status policy, metadata/quality count use, and no report-file creation.
|
||||
- Verified `uv sync` passes.
|
||||
- Verified `uv run pytest tests/test_quality.py tests/test_report.py tests/test_metadata.py` passes 26 tests.
|
||||
- Verified `uv run pytest` passes 103 tests.
|
||||
- Verified `git diff --check` passes for the Sprint 6 implementation.
|
||||
- Checked the implementation for network client imports; none were found.
|
||||
- Checked the implementation for conversion orchestration, final output writing, metadata JSON writing, `.report.md` file writing, real MinerU invocation, setup scripts, and CLI convert behavior; no Sprint 6 code introduced those paths.
|
||||
- Completed the final independent Sprint 6 evaluation; result was PASS.
|
||||
- Completed the final independent Sprint 7 evaluation after fixing math renderability metadata counts; result was PASS.
|
||||
- Started Sprint 8 implementation after user approval.
|
||||
- Added `src/pdf2md/doctor.py` for mockable setup diagnostics covering Python 3.12, `uv`, MinerU availability/version, NVIDIA GPU visibility, PyTorch CUDA visibility, local model/cache/config detection, and strict-local policy reporting.
|
||||
- Added `pdf2md doctor` CLI integration without changing `pdf2md convert` or `pdf2md --version` behavior.
|
||||
- Updated `README.md` with Windows PowerShell setup, `uv`, MinerU 3.1.0 direct CLI expectations, model/cache environment notes, GTX 1070 Ti risk, and strict-local runtime policy.
|
||||
- Added mocked doctor and CLI tests for success, warning-only success, hard dependency failure, missing `uv`, missing MinerU, MinerU version warnings, missing GPU/PyTorch warnings, GTX 1070 Ti/Pascal risk, and missing model/cache warnings.
|
||||
- Verified `uv run pytest tests/test_doctor.py tests/test_cli.py` passes 22 tests.
|
||||
- Verified `uv sync` passes.
|
||||
- Verified `uv run pytest` passes 133 tests.
|
||||
- Verified `uv run pdf2md --version` prints `pdf2md 0.1.0`.
|
||||
- Verified local `uv run pdf2md doctor` returns exit code 1 because MinerU is not installed; it reports Python and `uv` pass, GTX 1070 Ti/Pascal risk warning, PyTorch missing warning, model/cache missing warning, and strict-local pass.
|
||||
- Completed independent Sprint 8 evaluation; result was PASS.
|
||||
- Committed Sprint 8 implementation as `7d965e3 feat: implement sprint 8 doctor diagnostics`.
|
||||
- Created `docs/Sprints/SPRINT9CONTRACT.md` for local fixture evaluation and the v1 release gate.
|
||||
- Added direct `docs/Sprints/SPRINT9CONTRACT.md` references to the agents that need Sprint 9 fixture evaluation, release gate, strict-local, or implementation context.
|
||||
- Updated `docs/V1IMPLEMENTATIONPLAN.md` to point Sprint 9 at the new contract and current Sprint 8 completion state.
|
||||
- Started Sprint 9 implementation after user approval and pre-implementation contract review PASS.
|
||||
- Added fast mocked v1 release-gate integration tests in `tests/integration/test_v1_fast_release_gate.py`.
|
||||
- Added explicit opt-in local MinerU fixture evaluation in `tests/integration/test_optional_mineru_fixtures.py`, gated by `PDF2MD_RUN_MINERU_FIXTURES=1`.
|
||||
- Added `docs/V1RELEASECHECKLIST.md` with default fast gates, strict-local release gates, doctor hard-failure handling, optional sample gates, fixture coverage notes, and no-sample-commit checks.
|
||||
- Updated `README.md` to point at the v1 release checklist and optional fixture evaluation gate.
|
||||
- Verified `uv run pytest tests/integration tests/test_conversion.py tests/test_cli.py` passes 24 tests with 1 optional skip.
|
||||
- Verified `uv run pytest tests/integration` passes 3 fast tests with 1 optional skip.
|
||||
- Verified opt-in `PDF2MD_RUN_MINERU_FIXTURES=1 uv run pytest -rs tests/integration/test_optional_mineru_fixtures.py` is skipped with a clear doctor blocker because MinerU is not installed.
|
||||
- Verified `uv run pytest` passes 136 tests with 1 optional skip.
|
||||
- Completed independent Sprint 9 evaluation; result was PASS.
|
||||
- Committed Sprint 9 implementation as `466abcf feat: implement sprint 9 release gate`.
|
||||
- Attempted `samples/MITC공부.pdf` conversion after installing MinerU; the run did not produce a successful conversion and was stopped after the user observed CPU-bound execution.
|
||||
- Added `outputs/` to `.gitignore` and removed the leftover generated output directory from the stopped sample run.
|
||||
- Updated default conversion behavior so `convert_pdf`, `convert_input`, and `pdf2md convert` default to `cuda:0`.
|
||||
- Updated the MinerU adapter to map CUDA requests to the MinerU subprocess environment with `MINERU_DEVICE_MODE` and `CUDA_VISIBLE_DEVICES`, preserving strict-local direct CLI execution.
|
||||
- Updated README, PRD, and architecture docs to document GPU default behavior and the remaining CUDA/PyTorch requirement.
|
||||
- Verified the GPU-default change with targeted tests, full tests, `git diff --check`, CLI help, and `pdf2md doctor`.
|
||||
- Re-ran `uv run pdf2md convert samples\MITC공부.pdf --out outputs\MITC공부 --overwrite`; the CLI reported `converted: 0`, `failed: 1`, `warnings: 1`, and wrote no Markdown, metadata JSON, or `.report.md`.
|
||||
- Confirmed the failure with a direct adapter probe using an ASCII work directory: MinerU 3.1.0 started its allowed temporary local `mineru-api`, used `hybrid-auto-engine`, attempted to load the VLM model on CUDA, and failed with `AssertionError: Torch not compiled with CUDA enabled`.
|
||||
- Left the product conversion stdout/stderr logs under ignored `outputs/MITC공부.logs`; removed temporary diagnostic probe directories.
|
||||
- Removed and recreated the project `.venv` with `uv sync`.
|
||||
- Installed CUDA-enabled PyTorch runtime: `torch==2.6.0+cu126` and `torchvision==0.21.0+cu126`.
|
||||
- Verified CUDA with an actual tensor operation on `NVIDIA GeForce GTX 1070 Ti`, compute capability `6.1`.
|
||||
- Installed `mineru[core]==3.1.0`; verified `mineru, version 3.1.0`.
|
||||
- Downloaded MinerU pipeline and VLM models with `uv run mineru-models-download -s huggingface -m all`; MinerU wrote model paths to `C:\Users\user\mineru.json`.
|
||||
- Set `MINERU_MODEL_SOURCE=local` at user scope and current process scope.
|
||||
- Verified `uv run pdf2md doctor` reports PyTorch CUDA available and model config detected; remaining WARN status is the intentional Pascal/pre-Turing GPU risk warning.
|
||||
- Verified `uv run pytest` passes 138 tests with 1 optional skip in the rebuilt environment.
|
||||
- Fixed real MinerU nested `images/...` asset-link rewriting so copied assets under `<stem>.assets/.../images/` resolve from the final Markdown.
|
||||
- Fixed page-count extraction for MinerU-style structured lists with `page_idx` values.
|
||||
- Verified `uv run pytest tests/test_conversion.py` passes 12 tests.
|
||||
- Verified `uv run pytest` passes 139 tests with 1 optional skip after the asset/page-count fix.
|
||||
- Re-ran `samples/MITC공부.pdf` conversion with `MINERU_MODEL_SOURCE=local`; MinerU used GPU via the direct local CLI and CLI-internal temporary local `mineru-api`.
|
||||
- The final sample outputs are `outputs/MITC공부/MITC공부.md`, `outputs/MITC공부/MITC공부.metadata.json`, `outputs/MITC공부/MITC공부.report.md`, and `outputs/MITC공부/MITC공부.assets/`.
|
||||
- The final sample report status is `partial` only because the local math render checker is unavailable; asset link checks pass with 0 missing and 0 invalid links.
|
||||
- Sample summary: 13 pages processed, 107 assets, 23 inline formulas, 103 display formulas, 1 info warning.
|
||||
- Added `docs/MATHJAXCHECKERPLAN.md` with the local MathJax checker objective, touched surfaces, Node helper contract, Python wrapper behavior, tests, acceptance criteria, and open implementation decisions.
|
||||
- Implemented the local MathJax render checker with `MathExpression` extraction, a local Node.js helper, a Python wrapper, default conversion integration, `doctor` diagnostics, setup documentation, and mocked tests.
|
||||
- Verified `npm run mathjax-checker:health` returns `{"ok":true}` after local `npm install`.
|
||||
- Verified direct helper JSON stdin reports one valid expression as ok and a malformed display formula as `Missing close brace`.
|
||||
- Verified `create_default_math_checker()` finds the local checker and records one render failure for malformed display math.
|
||||
- Verified targeted tests pass: `uv run pytest tests/test_quality.py tests/test_math_render.py tests/test_conversion.py tests/test_doctor.py tests/test_cli.py`.
|
||||
- Verified full tests pass: `uv run pytest` passed 150 tests with 1 optional skip.
|
||||
- Verified `git diff --check` passes.
|
||||
- Researched local PDF chunking packages and MinerU page-range behavior for Sprint 10.
|
||||
- Created `docs/Sprints/SPRINT10CONTRACT.md` recommending `pypdf>=6.10.2,<7` for 20-page local chunk PDFs, with chunk outputs converted independently and no Markdown merge.
|
||||
- Implemented Sprint 10 with `pypdf>=6.10.2,<7`, `src/pdf2md/pdf_splitter.py`, `--chunk-pages [PAGES]`, chunk-aware conversion orchestration, and chunk report context.
|
||||
- `--chunk-pages` is opt-in; when present without a value it uses 20 pages.
|
||||
- `convert_pdf()` returns `BatchConversionResult` when `chunk_pages` is set and keeps returning `ConversionResult` when chunking is unset.
|
||||
- Temporary chunk PDFs are deleted after conversion completes, including when raw MinerU output is retained.
|
||||
- Verified targeted Sprint 10 tests: `uv run pytest tests/test_pdf_splitter.py tests/test_conversion.py tests/test_cli.py tests/test_report.py` passed 42 tests.
|
||||
- Verified full default test suite: `uv run pytest` passed 163 tests with 1 optional skip.
|
||||
- Verified `git diff --check` passed with line-ending warnings only.
|
||||
|
||||
## In Progress
|
||||
|
||||
- No active implementation chunk.
|
||||
|
||||
## Blockers
|
||||
|
||||
- No active blocker for the completed `samples/MITC공부.pdf` conversion.
|
||||
- GTX 1070 Ti remains an 8GB Pascal GPU; larger PDFs may still hit VRAM or model compatibility limits even though this sample completed.
|
||||
|
||||
## Next Actions
|
||||
|
||||
1. Review the generated `outputs/MITC공부/MITC공부.md` in Obsidian if visual quality needs manual assessment.
|
||||
2. Run optional real local chunked conversion on a long sample only if requested.
|
||||
3. Run `npm install` and `npm run mathjax-checker:health` when real local MathJax checker validation is desired.
|
||||
4. Preserve the strict-local rule: setup downloads may be explicit, but runtime conversion must use local model paths, direct CLI execution, and no user-specified API or remote backend.
|
||||
|
||||
## Sprint 9 Handoff
|
||||
|
||||
- Files changed: `tests/integration/test_v1_fast_release_gate.py`, `tests/integration/test_optional_mineru_fixtures.py`, `docs/V1RELEASECHECKLIST.md`, `README.md`, `PLAN.md`, `PROGRESS.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT9CONTRACT.md`.
|
||||
- Commands run: `uv run pytest tests/integration tests/test_conversion.py tests/test_cli.py`, `uv run pytest tests/integration`, `PDF2MD_RUN_MINERU_FIXTURES=1 uv run pytest -rs tests/integration/test_optional_mineru_fixtures.py`, `uv run pytest`, `git diff --check`, and `git status --short --untracked-files=all`.
|
||||
- Tests passed: targeted integration/CLI/conversion run passed 24 tests with 1 optional skip; integration-only run passed 3 fast tests with 1 optional skip; full `uv run pytest` passed 136 tests with 1 optional skip.
|
||||
- Tests blocked: optional real MinerU fixture conversion is blocked by `pdf2md doctor` because the `mineru` CLI is not installed on PATH.
|
||||
- Optional local MinerU status: explicitly gated by `PDF2MD_RUN_MINERU_FIXTURES=1`; current opt-in run skips with doctor blocker instead of pretending real validation passed.
|
||||
- Fixture coverage: release checklist maps local samples to math-heavy, table/formula, figures/assets, reading-order, and Korean filename/path risk categories; simple one-page, table-dominant, and figure-heavy known-baseline gaps remain.
|
||||
- Generated output locations: none persisted; optional output path uses pytest `tmp_path`.
|
||||
- Known failures: local doctor fails on missing MinerU CLI.
|
||||
- Independent evaluation: PASS.
|
||||
- Residual risks: no real MinerU output has been validated yet; GTX 1070 Ti/PyTorch acceleration and model/cache setup remain unproven; optional fixture quality still requires local MinerU setup.
|
||||
- User decisions needed: decide whether to install/configure MinerU 3.1.0 and run optional fixture validation.
|
||||
- V1 release recommendation: default fast gates are healthy, but full real-MinerU v1 validation is blocked until doctor passes or the user records a waiver.
|
||||
- Go/no-go recommendation for next sprint: go only for real setup/fixture validation if the user wants to proceed with local MinerU installation.
|
||||
- Next action: commit Sprint 9 implementation.
|
||||
|
||||
## Sprint 9 Contract Handoff
|
||||
|
||||
- Files changed: `docs/Sprints/SPRINT9CONTRACT.md`, `docs/V1IMPLEMENTATIONPLAN.md`, relevant `.codex/agents/*.toml`, `PLAN.md`, and `PROGRESS.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, agent TOML parse check, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and local sample filename listing.
|
||||
- Tests passed: `uv run pytest` passed 133 tests.
|
||||
- Tests blocked: None expected for the contract-only change.
|
||||
- Known failures: local `pdf2md doctor` still fails until MinerU is installed on PATH.
|
||||
- Residual risks: Sprint 9 is contract-only; fast mocked integration tests, optional local MinerU fixture harness, fixture coverage manifest, and release checklist are not implemented yet.
|
||||
- User decisions needed: None before Sprint 9 pre-implementation review.
|
||||
- Go/no-go recommendation for Sprint 9 implementation: review the contract first, then go if the user explicitly requests implementation.
|
||||
- Next action: verify and commit the Sprint 9 contract update.
|
||||
|
||||
## Sprint 8 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/doctor.py`, `src/pdf2md/cli.py`, `tests/test_doctor.py`, `tests/test_cli.py`, `README.md`, `PLAN.md`, `PROGRESS.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT8CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_doctor.py tests/test_cli.py`, `uv run pytest tests/test_doctor.py`, `uv run pytest`, `uv run pdf2md --version`, `uv run pdf2md doctor`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell strict-local/network pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_doctor.py tests/test_cli.py` passed 22 tests; `uv run pytest tests/test_doctor.py` passed 11 tests; `uv run pytest` passed 133 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: local `uv run pdf2md doctor` correctly fails because the `mineru` CLI is not installed on PATH.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully; local doctor warns for GTX 1070 Ti/Pascal risk, missing PyTorch, and missing MinerU model/cache/config path.
|
||||
- Independent evaluation: PASS.
|
||||
- Residual risks: Sprint 8 does not install MinerU, download models, validate real MinerU output, run sample PDFs, or prove GTX 1070 Ti PyTorch acceleration. Those remain Sprint 9/local setup work.
|
||||
- User decisions needed: None for Sprint 8.
|
||||
- Go/no-go recommendation for Sprint 9: go after a Sprint 9 contract is written and reviewed.
|
||||
- Next action at completion: prepare the Sprint 9 contract when requested.
|
||||
|
||||
## Sprint 8 Contract Handoff
|
||||
|
||||
Historical note: this contract-only handoff was superseded by the implemented Sprint 8 handoff above.
|
||||
|
||||
- Files changed: `docs/Sprints/SPRINT8CONTRACT.md`, `docs/V1IMPLEMENTATIONPLAN.md`, relevant `.codex/agents/*.toml`, `PLAN.md`, and `PROGRESS.md`.
|
||||
- Commands run: `uv --version`, agent TOML parse check, `uv sync`, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell reference checks.
|
||||
- Tests passed: `uv run pytest` passed 119 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 8 is contract-only; `pdf2md doctor`, doctor diagnostics, setup docs, and setup helper scripts are not implemented yet.
|
||||
- User decisions needed: None before Sprint 8 pre-implementation review.
|
||||
- Go/no-go recommendation for Sprint 8 implementation: review the contract first, then go if the user explicitly requests implementation.
|
||||
- Next action: commit the contract update, then wait for an explicit Sprint 8 implementation request.
|
||||
|
||||
## Sprint 7 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/conversion.py`, `src/pdf2md/cli.py`, `src/pdf2md/__init__.py`, `tests/test_conversion.py`, `tests/test_cli.py`, `tests/test_package.py`, `PLAN.md`, `PROGRESS.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT7CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_conversion.py tests/test_cli.py`, `uv run pytest tests/test_conversion.py tests/test_cli.py tests/test_package.py`, `uv run pytest tests/test_conversion.py tests/test_metadata.py tests/test_report.py`, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell strict-local/network pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_conversion.py tests/test_cli.py` passed 18 tests; `uv run pytest tests/test_conversion.py tests/test_cli.py tests/test_package.py` passed 16 tests before the math renderability fix; `uv run pytest tests/test_conversion.py tests/test_metadata.py tests/test_report.py` passed 29 tests; `uv run pytest` passed 119 tests after the metadata math count fix.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Independent evaluation: PASS after fixing math renderability metadata counts.
|
||||
- Residual risks: Sprint 7 uses fake adapters in default tests; it does not run real MinerU, probe real MinerU output, implement `pdf2md doctor`, validate CUDA/GPU, install models, or run sample PDFs.
|
||||
- User decisions needed: None for Sprint 7.
|
||||
- Go/no-go recommendation for Sprint 8: go.
|
||||
- Next action: prepare Sprint 8 contract when requested.
|
||||
|
||||
## Sprint 7 Contract Handoff (Historical)
|
||||
|
||||
- Files changed: `docs/Sprints/SPRINT7CONTRACT.md`, `docs/V1IMPLEMENTATIONPLAN.md`, `.codex/agents/feature-generator-agent.toml`, `.codex/agents/evaluation-agent.toml`, `.codex/agents/requirements-guard-agent.toml`, `.codex/agents/harness-planner-agent.toml`, `.codex/agents/mineru-integration-agent.toml`, `.codex/agents/metadata-agent.toml`, `.codex/agents/obsidian-markdown-agent.toml`, `PLAN.md`, and `PROGRESS.md`.
|
||||
- Commands run: `uv --version`, agent TOML parse check, `uv sync`, `uv run pytest`, `git diff --check`, and `git status --short --untracked-files=all`.
|
||||
- Tests passed: `uv run pytest` passed 103 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks at that time: Sprint 7 was contract-only before implementation. Superseded by the Sprint 7 Handoff above.
|
||||
- User decisions needed at that time: None before Sprint 7 pre-implementation review.
|
||||
- Go/no-go recommendation at that time: review the contract first, then go if the user explicitly requests implementation.
|
||||
- Next action at that time: commit the contract update, then wait for an explicit Sprint 7 implementation request.
|
||||
|
||||
## Sprint 6 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/quality.py`, `src/pdf2md/report.py`, `tests/test_quality.py`, `tests/test_report.py`, `PLAN.md`, `PROGRESS.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT6CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_quality.py tests/test_report.py tests/test_metadata.py`, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell file/pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_quality.py tests/test_report.py tests/test_metadata.py` passed 26 tests; `uv run pytest` passed 103 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none in Sprint 6 implementation.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 6 intentionally does not run real MinerU, run a real math renderer, parse PDFs, write final Markdown files, copy assets, write metadata JSON files, write `.report.md` files, expose working `convert`, or implement `doctor`.
|
||||
- User decisions needed: None for Sprint 6.
|
||||
- Go/no-go recommendation for Sprint 7: go.
|
||||
- Next action: prepare Sprint 7 contract when requested.
|
||||
|
||||
## Sprint 5 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/markdown.py`, `src/pdf2md/ir.py`, `tests/test_markdown.py`, `PLAN.md`, `PROGRESS.md`, `docs/V1IMPLEMENTATIONPLAN.md`, and `docs/Sprints/SPRINT5CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_markdown.py tests/test_ir.py`, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell file/pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_markdown.py tests/test_ir.py` passed 30 tests; `uv run pytest` passed 89 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none in Sprint 5 implementation.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 5 intentionally does not run real MinerU, probe real MinerU Markdown, parse PDFs, write final Markdown files, copy assets, write metadata JSON, generate `.report.md`, expose working `convert`, or implement `doctor`.
|
||||
- User decisions needed: None for Sprint 5.
|
||||
- Go/no-go recommendation for Sprint 6: go.
|
||||
- Next action: prepare Sprint 6 contract when requested.
|
||||
|
||||
## Sprint 4 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/mineru_adapter.py`, `tests/test_mineru_adapter.py`, `PLAN.md`, and `PROGRESS.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_mineru_adapter.py`, `uv run pytest`, `git diff --check`, `git status --short --untracked-files=all`, and PowerShell file/pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_mineru_adapter.py` passed 26 tests; `uv run pytest` passed 72 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none in Sprint 4 implementation after fixing the independent evaluation finding.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 4 intentionally does not run real MinerU, install models, probe real MinerU output layout, parse PDFs, normalize Markdown, write metadata JSON, generate `.report.md`, expose working `convert`, or implement `doctor`.
|
||||
- User decisions needed: None for Sprint 4.
|
||||
- Go/no-go recommendation for Sprint 5: go.
|
||||
- Next action: prepare Sprint 5 contract when requested.
|
||||
|
||||
## Sprint 3 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/ir.py`, `src/pdf2md/metadata.py`, `tests/test_ir.py`, `tests/test_metadata.py`, `PLAN.md`, `PROGRESS.md`, and `docs/Sprints/SPRINT3CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_ir.py tests/test_metadata.py`, `uv run pytest`, `git diff --check`, `git status --short`, and PowerShell file/pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_ir.py tests/test_metadata.py` passed 25 tests; `uv run pytest` passed 46 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none in Sprint 3 implementation.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 3 intentionally does not parse PDFs, compute SHA-256, invoke MinerU, write conversion outputs, normalize Markdown, create full report content, run quality checks, or expose working `convert` or `doctor` commands.
|
||||
- User decisions needed: None for Sprint 3.
|
||||
- Go/no-go recommendation for Sprint 4: go.
|
||||
- Next action: prepare Sprint 4 contract when requested.
|
||||
|
||||
## Sprint 2 Handoff
|
||||
|
||||
- Files changed: `src/pdf2md/paths.py`, `tests/test_paths.py`, `PLAN.md`, and `PROGRESS.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest tests/test_paths.py`, `uv run pytest`, `git diff --check`, `git status --short`, and PowerShell file/pattern checks.
|
||||
- Tests passed: `uv run pytest tests/test_paths.py` passed 17 tests; `uv run pytest` passed 21 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: none in Sprint 2 implementation.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: Sprint 2 intentionally does not parse PDFs, compute SHA-256, invoke MinerU, write conversion outputs, normalize Markdown, create metadata/report content, or expose a working `convert` command.
|
||||
- User decisions needed: None for Sprint 2.
|
||||
- Go/no-go recommendation for Sprint 3: go.
|
||||
- Next action: prepare Sprint 3 contract when requested.
|
||||
|
||||
## Sprint 1 Handoff
|
||||
|
||||
- Files changed: `pyproject.toml`, `uv.lock`, `.gitignore`, `README.md`, `src/pdf2md/__init__.py`, `src/pdf2md/cli.py`, `tests/test_package.py`, `tests/test_cli.py`, `PROGRESS.md`, `PLAN.md`, and `docs/Sprints/SPRINT1CONTRACT.md`.
|
||||
- Commands run: `uv --version`, `uv sync`, `uv run pytest`, `uv run pdf2md --version`, `git diff --check`, `git status --short`, and PowerShell file/pattern checks after `rg.exe` returned access denied.
|
||||
- Tests passed: `uv run pytest` passed 4 tests.
|
||||
- Tests blocked: None.
|
||||
- Known failures: `uv` may not be visible to a newly opened shell until PATH is refreshed; `rg.exe` returned access denied in this environment, so PowerShell checks were used instead.
|
||||
- Known warnings: `uv` ignored Miniforge's invalid `SSL_CERT_DIR` path during sync/test commands, but the commands completed successfully.
|
||||
- Residual risks: the scaffold intentionally does not validate MinerU, CUDA, model paths, conversion output, metadata, or quality reports.
|
||||
- User decisions needed: None for Sprint 1.
|
||||
- Go/no-go recommendation for Sprint 2: go.
|
||||
- Next action: prepare Sprint 2 contract when requested.
|
||||
|
||||
## Sprint 0 Handoff
|
||||
|
||||
Superseded note: the following Sprint 0 facts describe the completed 2.5.4 verification pass. The current engine decision is MinerU 3.1.0.
|
||||
|
||||
- Files changed: `docs/KNOWLEDGEBASE.md`, `docs/Sprints/SPRINT0CONTRACT.md`, `docs/V1IMPLEMENTATIONPLAN.md`, `PROGRESS.md`.
|
||||
- Sources checked: MinerU 2.5.4 PyPI, MinerU 2.5.4 tag files, MinerU output/model docs, Python/uv/PyTorch/NVIDIA docs, MinerU/model license sources.
|
||||
- Local commands run: `python --version`, `uv --version`, `nvidia-smi`.
|
||||
- Facts confirmed at that time: Python 3.12.7 is present; `uv` is missing; GTX 1070 Ti 8GB is visible; MinerU 2.5.4 direct CLI path is source-verified; MinerU/model 2.5-era licenses should be treated as AGPL-3.0.
|
||||
- Inferences made at that time: v1 should pin MinerU to `mineru[core]==2.5.4`; strict-local runtime should require local model source configuration; current MinerU 3.x docs should not drive the older 2.5 adapter behavior.
|
||||
- Known failures: `uv --version` failed because `uv` is not on PATH.
|
||||
- Residual risks: GTX 1070 Ti/PyTorch CUDA compatibility, real MinerU output layout until local probe, AGPL redistribution obligations, setup-download versus runtime-local separation.
|
||||
- Go/no-go recommendation: `go-with-risks`.
|
||||
- Next action: resolve `uv` availability or include bootstrap handling in Sprint 1, then create the Sprint 1 contract.
|
||||
Reference in New Issue
Block a user