# PROGRESS.md This file records current progress for agents. Read it before starting work, then update it after meaningful changes. Completed historical work is archived in `docs/WORKARCHIVE.md`. ## Current Status - Project direction is documented in `PRD.md`, `ARCHITECTURE.md`, `AGENTS.md`, and `docs/KNOWLEDGEBASE.md`. - MinerU 3.1.0 is fixed as the only conversion engine. - The converter currently includes path planning, project-owned records, metadata, direct local MinerU adapter boundary, Obsidian Markdown normalization, local quality checks, report rendering, conversion orchestration, `pdf2md convert`, `pdf2md doctor`, local MathJax render checking, release-gate tests, and opt-in pre-conversion PDF chunking. - `docs/V1IMPLEMENTATIONPLAN.md` defines the v1 implementation sequence. - `docs/Sprints/` contains completed sprint contracts through Sprint 10. - `docs/WORKARCHIVE.md` contains completed sprint history, historical verification results, runtime setup notes, and sample conversion evidence. - `samples/` exists locally and is untracked by git. - `outputs/` is ignored and contains local generated conversion outputs. ## Environment Notes - OS/workspace: Windows PowerShell in `D:\Work\Repos\AICoding\ConvertPDFToMD`. - Python target: 3.12. - Local Python observed: 3.12.7. - `uv` is installed per-user at `C:\Users\user\.local\bin`. - Target GPU: NVIDIA GTX 1070 Ti 8GB. - Default conversion device: `cuda:0`. - MinerU execution mode: direct local `mineru` CLI only. - Strict-local allows MinerU 3.1.0's CLI-internal temporary local `mineru-api` when the CLI runs without `--api-url`. - Strict-local prohibits `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends. - Current local runtime has CUDA-enabled PyTorch `2.6.0+cu126`, `torchvision 0.21.0+cu126`, `mineru[core]==3.1.0`, local MinerU models, and `MINERU_MODEL_SOURCE=local`. - Current `pdf2md doctor` status is WARN only because GTX 1070 Ti is Pascal/pre-Turing; MinerU, CUDA PyTorch, local model config, MathJax, and strict-local checks pass. ## Recent Completed Work - Archived completed sprint and setup history into `docs/WORKARCHIVE.md`. - Added `docs/WORKARCHIVE.md` references to `AGENTS.md`, `PLAN.md`, `docs/V1IMPLEMENTATIONPLAN.md`, relevant `.codex/agents/*.toml`, `.codex/commands/*.md`, and project skills. - Sprint 10 is implemented with `pypdf>=6.10.2,<7`, `src/pdf2md/pdf_splitter.py`, `--chunk-pages [PAGES]`, chunk-aware conversion orchestration, temporary chunk cleanup, and chunk report context. - `--chunk-pages` is opt-in; when present without a value it uses 20 pages. - `convert_pdf()` returns `BatchConversionResult` when `chunk_pages` is set and keeps returning `ConversionResult` when chunking is unset. - Converted `samples/FourNodeQuadrilateralShellElementMITC4.pdf` with `MINERU_MODEL_SOURCE=local` and default `--gpu cuda:0`; output was written to ignored `outputs/FourNodeQuadrilateralShellElementMITC4/`. - The FourNode sample conversion report status was `success`: 7 pages, 22 assets, 38 inline formulas, 16 display formulas, 0 math render errors, and 0 warnings. ## In Progress - No active implementation chunk. ## Blockers - No active blocker. - GTX 1070 Ti remains an 8GB Pascal GPU; larger PDFs may still hit VRAM or model compatibility limits. ## Next Actions 1. Review generated sample Markdown outputs in Obsidian if visual quality needs manual assessment. 2. Run optional real local chunked conversion on a long sample only if requested. 3. Preserve strict-local runtime behavior: use local model paths, direct CLI execution, and no user-specified API or remote backend.