# Sprint 17 Contract: Offline Windows Installer Status: Abandoned Last updated: 2026-05-13 ## Abandonment Note Sprint 17 was abandoned at the user's request on 2026-05-13 before implementation began. This document remains as a historical planning record only. Do not implement or extend this contract unless the user explicitly reopens offline installer work. ## Objective Create a large offline Windows installer that can install the existing local `pdf2md` runtime on another Windows PC without internet access. The installer must install or stage all application-owned files needed after download time: the minimal UI executable, the project runtime, a target-local Python virtual environment created from bundled wheels, CUDA PyTorch wheels, MinerU 3.1.0 wheels and dependencies, local MinerU model files, optional local Node.js/MathJax assets, Start Menu shortcuts, setup logs, and a post-install `pdf2md doctor` verification path. This sprint does not change conversion behavior. It packages the already implemented CLI/UI/runtime for offline use. ## Product Decision The offline package should create the target PC virtual environment during installation instead of copying the current development `.venv`. Reasoning: - Python virtual environments and console entry points often contain absolute paths and are not a reliable redistribution unit. - A target-local `.venv` created from a bundled wheelhouse is more reproducible and easier to repair. - The installer can keep the wheelhouse for offline repair, uninstall/reinstall, and audit. ## Installer Shape Recommended installer technology: - Inno Setup for the Windows installer shell because it can compile scripts from the command line with `ISCC.exe`, returns deterministic exit codes, and is simple enough for a per-user installer. - PowerShell scripts for payload build, target runtime install, and target verification. - PyInstaller remains only the UI executable builder. It must not become the full MinerU/PyTorch/model bundler. Default install root: ```text %LOCALAPPDATA%\Programs\ConvertPDFToMD\ ``` Installed layout: ```text ConvertPDFToMD/ app/ pdf2md-ui.exe runtime/ pyproject.toml uv.lock README.md src/ tools/ package.json package-lock.json .venv/ payload/ python/ uv/ wheelhouse/ requirements-runtime-cu126.txt models/ node/ node_modules/ payload-manifest.json SHA256SUMS.txt THIRD_PARTY_NOTICES.md scripts/ install-runtime.ps1 repair-runtime.ps1 run-doctor.ps1 logs/ ``` Generated artifacts that must remain untracked: ```text dist/offline-installer/ dist/Pdf2MdOfflineSetup-*.exe ``` ## Payload Contents The first offline payload targets Windows x64, Python 3.12, CUDA PyTorch `2.6.0+cu126`, `torchvision 0.21.0+cu126`, and `mineru[core]==3.1.0`. Required: - `dist/pdf2md-ui.exe` from the existing PyInstaller build. - Tracked project runtime files needed to run `uv run pdf2md`. - A Windows x64 Python 3.12 installer or an equivalent approved Python runtime package. - A Windows x64 `uv.exe`. - A wheelhouse containing: - the current project wheel, - `pypdf`, - `torch==2.6.0`, - `torchvision==0.21.0`, - `mineru[core]==3.1.0`, - all transitive Python runtime dependencies. - Local MinerU model files and the model config template needed for `MINERU_MODEL_SOURCE=local`. - A manifest listing every payload file, size, SHA-256 hash, source URL or local source, and license family. Optional but recommended: - Portable local Node.js runtime. - `node_modules/` containing the locked MathJax checker dependencies from `package-lock.json`. Explicitly excluded: - `samples/`. - `outputs/`. - `.git/`. - The development `.venv/`. - Local generated PyInstaller `build/` folders and `.spec` files unless the implementation deliberately adds a stable project-owned spec file. - NVIDIA GPU drivers and CUDA Toolkit installers. The installer may check for a compatible NVIDIA driver through `nvidia-smi`, but it should not redistribute GPU drivers in this sprint. ## Touched Surfaces Allowed during implementation: - Create `packaging/offline/build-offline-payload.ps1`. - Create `packaging/offline/verify-offline-payload.ps1`. - Create `packaging/offline/install-runtime.ps1`. - Create `packaging/offline/repair-runtime.ps1`. - Create `packaging/offline/run-doctor.ps1`. - Create `packaging/offline/Pdf2MdOffline.iss`. - Create `packaging/offline/requirements-runtime-cu126.txt`. - Create `packaging/offline/README.md`. - Create `packaging/offline/THIRD_PARTY_NOTICES.md`. - Create `src/pdf2md/packaging_manifest.py` only if a Python helper is simpler than repeating manifest logic in PowerShell. - Modify `src/pdf2md_ui/runner.py` so the UI can resolve an installed target-local `.venv\Scripts\pdf2md.exe` before falling back to PATH or `uv run pdf2md`. - Modify `src/pdf2md_ui/app.py` only if the project root default must prefer the installed runtime folder. - Modify `tests/test_ui_runner.py`. - Create `tests/test_offline_packaging.py`. - Modify `README.md`. - Modify `docs/V1RELEASECHECKLIST.md`. - Modify `PLAN.md`. - Modify `PROGRESS.md`. - Modify `docs/WORKARCHIVE.md` after implementation. Not allowed: - Do not change MinerU 3.1.0 as the fixed conversion engine. - Do not add a second conversion engine. - Do not add runtime network calls, `--api-url`, router mode, remote APIs, HTTP client backends, remote OpenAI-compatible backends, or hosted renderers. - Do not copy the development `.venv` as the installed runtime. - Do not make default tests depend on real MinerU, GPU, model files, network, Obsidian, MathJax, Inno Setup, or `samples/`. - Do not commit generated installer payloads, model files, wheelhouse files, Python installers, `dist/`, `outputs/`, or `samples/`. ## Architecture Plan ### WP17.1: Offline Payload Builder Add a build script that creates a clean staging folder under `dist/offline-installer/` with `app/`, `runtime/`, and `payload/` subfolders that mirror the final install layout. Responsibilities: - Rebuild `dist/pdf2md-ui.exe`. - Build the project wheel into the staging wheelhouse. - Download or collect Python wheels for the target runtime on a connected build PC. - Collect the Windows Python runtime package and `uv.exe`. - Copy project runtime files without `.git`, `.venv`, `outputs/`, `samples/`, and build trash. - Copy local MinerU model files from a configured source path. - Optionally copy portable Node.js and the locked `node_modules/`. - Generate `payload-manifest.json` and `SHA256SUMS.txt`. - Fail if any required file is missing or if any wheel dependency would require internet during installation. The builder may use `python -m pip download` on the connected build PC. The target installer must use only local files, for example `uv pip install --no-index --find-links`. ### WP17.2: Target Runtime Installer Add a PowerShell install script that runs from the installed payload and creates the real runtime on the target PC. Responsibilities: - Verify payload hashes before installing. - Install or locate Python 3.12 x64. - Create `runtime\.venv` on the target PC. - Install packages from `payload\wheelhouse` with network disabled. - Install the project wheel into the target `.venv`. - Preserve the bundled wheelhouse for offline repair. - Configure `MINERU_MODEL_SOURCE=local` for UI/CLI child processes. - Configure local MinerU model paths without silently overwriting an unrelated user `mineru.json`. - If `%USERPROFILE%\mineru.json` already exists and points elsewhere, prompt in interactive mode; in silent mode, fail clearly and leave `repair-runtime.ps1` instructions. - Run `pdf2md doctor` and write the result to `logs\doctor-after-install.txt`. ### WP17.3: UI Runtime Resolution Adjust the UI runner for an installed offline layout. Resolution order: 1. Explicit configured `pdf2md` command. 2. Installed runtime `.venv\Scripts\pdf2md.exe` under the selected project root. 3. `pdf2md` on PATH. 4. Bundled `uv.exe` plus `uv run --offline pdf2md` under the selected project root. 5. Existing system `uv run pdf2md` fallback. Child environment rules: - Set `MINERU_MODEL_SOURCE=local` unless explicitly set. - Add installed `.venv\Scripts` to PATH for runtime console scripts. - Add installed portable Node.js path to PATH when bundled. - Set `UV_OFFLINE=1` when using the installed offline runtime. - Do not add remote endpoints or backend flags. ### WP17.4: Inno Setup Installer Add an Inno Setup script that installs the payload and invokes the target runtime installer. Installer behavior: - Default to per-user install under `%LOCALAPPDATA%\Programs\ConvertPDFToMD`. - Create Start Menu shortcuts for: - `ConvertPDFToMD` UI, - `PDF2MD Doctor`, - `Repair PDF2MD Runtime`. - Run `install-runtime.ps1` after files are copied. - Show the doctor log path if setup finishes with WARN. - Fail the install on target runtime setup failure unless the user explicitly chooses to keep files for manual repair. ### WP17.5: License, Manifest, And Offline Verification Add docs and checks for redistribution risk. Required records: - Python, uv, PyInstaller, PyTorch, MinerU, model files, Node.js, MathJax, and transitive Python/npm dependency notices. - A manifest with file hashes and source URLs. - A clear statement that runtime conversion remains local-only and that setup payload creation can use internet only on the build PC. Verification tiers: - Fast tests use fake staging folders and fake wheel/model files. - Build-PC packaging smoke can create the staging folder without committing payload. - Offline target smoke uses a clean Windows VM with networking disabled. ## Implementation Task Plan ### Task 1: Packaging Manifest And Ignore Policy Files: - Create `tests/test_offline_packaging.py`. - Create `src/pdf2md/packaging_manifest.py` if needed. - Modify `.gitignore`. Steps: - Add failing tests for manifest generation with SHA-256, file size, relative path, and source label. - Add failing tests that payload paths under `dist/offline-installer/`, wheelhouse files, model files, and generated installer executables stay ignored. - Implement the smallest manifest helper or PowerShell-compatible JSON format. - Run `uv run pytest tests/test_offline_packaging.py`. - Commit manifest and ignore-policy changes. ### Task 2: Offline Payload Builder Files: - Create `packaging/offline/build-offline-payload.ps1`. - Create `packaging/offline/requirements-runtime-cu126.txt`. - Create `packaging/offline/README.md`. - Create `packaging/offline/verify-offline-payload.ps1`. - Modify `tests/test_offline_packaging.py`. Steps: - Add tests that the builder rejects missing UI exe, missing model source, missing Python runtime package, missing `uv.exe`, and empty wheelhouse. - Add tests that the builder excludes `.venv`, `.git`, `samples`, `outputs`, `node_modules` unless explicitly copied as the optional locked MathJax payload. - Implement payload staging, manifest generation, and payload verification. - Run `uv run pytest tests/test_offline_packaging.py`. - Run a dry build command that uses fake payload inputs. - Commit builder changes. ### Task 3: Target Runtime Install And Repair Scripts Files: - Create `packaging/offline/install-runtime.ps1`. - Create `packaging/offline/repair-runtime.ps1`. - Create `packaging/offline/run-doctor.ps1`. - Modify `tests/test_offline_packaging.py`. Steps: - Add tests that scripts contain `--no-index`, `--find-links`, `UV_OFFLINE=1`, and no `http://` or `https://` target-install commands. - Add tests that existing `mineru.json` handling is explicit and never silently overwritten. - Implement target-local `.venv` creation, offline package install, model config handling, doctor logging, and repair flow. - Run `uv run pytest tests/test_offline_packaging.py`. - Commit install-script changes. ### Task 4: UI Installed Runtime Resolution Files: - Modify `src/pdf2md_ui/runner.py`. - Modify `src/pdf2md_ui/app.py` only if needed. - Modify `tests/test_ui_runner.py`. Steps: - Add failing tests for project-root `.venv\Scripts\pdf2md.exe` resolution before PATH. - Add failing tests for bundled `uv.exe` plus `uv run --offline pdf2md` fallback. - Add failing tests that the child environment prepends `.venv\Scripts` and bundled Node.js when present. - Implement the minimal runner changes. - Run `uv run pytest tests/test_ui_runner.py`. - Commit UI resolution changes. ### Task 5: Inno Setup Script Files: - Create `packaging/offline/Pdf2MdOffline.iss`. - Modify `tests/test_offline_packaging.py`. Steps: - Add tests that the Inno script references the expected payload directories, Start Menu shortcuts, and runtime install script. - Add tests that the script does not reference `samples`, `outputs`, `.venv`, or remote URLs. - Implement the Inno script. - On a build PC with Inno Setup installed, run `ISCC.exe packaging\offline\Pdf2MdOffline.iss`. - Commit installer-script changes without committing the generated installer. ### Task 6: Documentation And Release Gate Files: - Modify `README.md`. - Modify `docs/V1RELEASECHECKLIST.md`. - Modify `docs/Sprints/SPRINT17CONTRACT.md`. - Modify `PLAN.md`. - Modify `PROGRESS.md`. - Modify `docs/WORKARCHIVE.md` after implementation. Steps: - Document build-PC prerequisites and target-PC prerequisites. - Document the offline artifact layout, expected size risk, and repair flow. - Document the clean offline VM smoke test. - Record final verification outcomes and residual risks. - Commit documentation and handoff updates. ## Verification Commands Default fast checks: ```powershell uv run pytest tests/test_offline_packaging.py tests/test_ui_runner.py uv run pytest git diff --check git status --short --untracked-files=all ``` Build-PC packaging checks: ```powershell uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py $pythonInstaller = "C:\BuildCache\python-3.12-amd64.exe" $uvExe = "C:\BuildCache\uv.exe" $mineruModels = "C:\BuildCache\mineru-models" powershell -ExecutionPolicy Bypass -File packaging\offline\build-offline-payload.ps1 -Configuration Release -PythonInstaller $pythonInstaller -UvExe $uvExe -MinerUModelSource $mineruModels powershell -ExecutionPolicy Bypass -File packaging\offline\verify-offline-payload.ps1 -PayloadRoot dist\offline-installer\payload ISCC.exe packaging\offline\Pdf2MdOffline.iss ``` Offline target smoke: ```powershell # Run on a clean Windows x64 VM with networking disabled after copying only the installer. .\Pdf2MdOfflineSetup-*.exe & "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\scripts\run-doctor.ps1" & "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" --version & "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" doctor ``` Optional conversion smoke on the offline target: ```powershell & "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" convert C:\LocalTest\SolidElement.pdf --out C:\LocalTest\outputs --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local ``` Expected optional output: ```text C:\LocalTest\outputs\SolidElement\SolidElement_001.md C:\LocalTest\outputs\SolidElement\SolidElement_report.md C:\LocalTest\outputs\SolidElement\images\ ``` ## Acceptance Criteria - The generated installer can install the runtime on a clean Windows x64 target without internet access. - The target runtime has a newly created local `.venv`; it is not a copied development `.venv`. - `pdf2md --version` runs from the installed `.venv`. - `pdf2md doctor` runs without network access and reports all install-relevant failures or warnings clearly. - The UI launches from the Start Menu and resolves the installed runtime without manual project-root configuration. - MinerU uses local models through `MINERU_MODEL_SOURCE=local` and local model config. - Python package installation uses only bundled local wheels. - The wheelhouse and model payload are hash-verified before install. - No generated payload, model file, wheel, installer exe, sample PDF, or conversion output is committed. - Default tests remain fast and independent of real MinerU, GPU, model files, network, Inno Setup, MathJax, or `samples/`. ## Hard Failure Criteria - The target installer downloads anything from the internet. - The UI or CLI introduces a runtime document upload path. - The installer silently overwrites an unrelated existing `mineru.json`. - The installer copies the development `.venv` as the installed runtime. - The installed UI cannot find `pdf2md` without manually editing settings on a clean install. - `pdf2md doctor` is skipped or its failure is hidden. - Payload hash verification is missing. - License/model redistribution review is skipped before sharing the installer outside the current personal environment. - NVIDIA drivers or CUDA Toolkit installers are redistributed in this sprint. ## Open Risks - The final installer may be very large because CUDA PyTorch wheels, MinerU dependencies, model weights, and optional Node/MathJax assets are large. - MinerU model redistribution terms and transitive package/model licenses must be reviewed before broader sharing. - Target PCs still need compatible NVIDIA hardware and drivers. The installer can verify and report this, but it cannot guarantee GPU compatibility. - Some conversions can still stall or run slowly on GTX 1070 Ti 8GB; packaging does not solve runtime performance. - Inno Setup may need practical size and antivirus/SmartScreen validation once real model payloads are included. ## Sources - PyInstaller usage: https://pyinstaller.org/en/stable/usage.html - Inno Setup command-line compiler: https://documentation.help/Inno-Setup/topic_compilercmdline.htm - uv CLI `--offline` behavior: https://docs.astral.sh/uv/reference/cli/ - uv cache behavior: https://docs.astral.sh/uv/concepts/cache/ - pip offline install/download behavior: https://pip.pypa.io/en/stable/cli/pip_install.html and https://pip.pypa.io/en/stable/cli/pip_download/ - PyTorch previous version wheel command for CUDA 12.6: https://pytorch.org/get-started/previous-versions/ - MinerU local model source behavior: https://opendatalab.github.io/MinerU/usage/model_source/ ## Handoff Requirements After implementation: - Update this contract status to `Implemented` or record the failed gate. - Record payload size and generated installer path in `PROGRESS.md`. - Record verification commands and outcomes in `PROGRESS.md`. - Archive implementation evidence and offline VM smoke results in `docs/WORKARCHIVE.md`. - Keep generated offline payloads, wheels, model files, installer exe, `dist/`, `outputs/`, and `samples/` uncommitted.