18 KiB
Sprint 17 Contract: Offline Windows Installer
Status: Abandoned Last updated: 2026-05-13
Abandonment Note
Sprint 17 was abandoned at the user's request on 2026-05-13 before implementation began. This document remains as a historical planning record only. Do not implement or extend this contract unless the user explicitly reopens offline installer work.
Objective
Create a large offline Windows installer that can install the existing local pdf2md runtime on another Windows PC without internet access.
The installer must install or stage all application-owned files needed after download time: the minimal UI executable, the project runtime, a target-local Python virtual environment created from bundled wheels, CUDA PyTorch wheels, MinerU 3.1.0 wheels and dependencies, local MinerU model files, optional local Node.js/MathJax assets, Start Menu shortcuts, setup logs, and a post-install pdf2md doctor verification path.
This sprint does not change conversion behavior. It packages the already implemented CLI/UI/runtime for offline use.
Product Decision
The offline package should create the target PC virtual environment during installation instead of copying the current development .venv.
Reasoning:
- Python virtual environments and console entry points often contain absolute paths and are not a reliable redistribution unit.
- A target-local
.venvcreated from a bundled wheelhouse is more reproducible and easier to repair. - The installer can keep the wheelhouse for offline repair, uninstall/reinstall, and audit.
Installer Shape
Recommended installer technology:
- Inno Setup for the Windows installer shell because it can compile scripts from the command line with
ISCC.exe, returns deterministic exit codes, and is simple enough for a per-user installer. - PowerShell scripts for payload build, target runtime install, and target verification.
- PyInstaller remains only the UI executable builder. It must not become the full MinerU/PyTorch/model bundler.
Default install root:
%LOCALAPPDATA%\Programs\ConvertPDFToMD\
Installed layout:
ConvertPDFToMD/
app/
pdf2md-ui.exe
runtime/
pyproject.toml
uv.lock
README.md
src/
tools/
package.json
package-lock.json
.venv/
payload/
python/
uv/
wheelhouse/
requirements-runtime-cu126.txt
models/
node/
node_modules/
payload-manifest.json
SHA256SUMS.txt
THIRD_PARTY_NOTICES.md
scripts/
install-runtime.ps1
repair-runtime.ps1
run-doctor.ps1
logs/
Generated artifacts that must remain untracked:
dist/offline-installer/
dist/Pdf2MdOfflineSetup-*.exe
Payload Contents
The first offline payload targets Windows x64, Python 3.12, CUDA PyTorch 2.6.0+cu126, torchvision 0.21.0+cu126, and mineru[core]==3.1.0.
Required:
dist/pdf2md-ui.exefrom the existing PyInstaller build.- Tracked project runtime files needed to run
uv run pdf2md. - A Windows x64 Python 3.12 installer or an equivalent approved Python runtime package.
- A Windows x64
uv.exe. - A wheelhouse containing:
- the current project wheel,
pypdf,torch==2.6.0,torchvision==0.21.0,mineru[core]==3.1.0,- all transitive Python runtime dependencies.
- Local MinerU model files and the model config template needed for
MINERU_MODEL_SOURCE=local. - A manifest listing every payload file, size, SHA-256 hash, source URL or local source, and license family.
Optional but recommended:
- Portable local Node.js runtime.
node_modules/containing the locked MathJax checker dependencies frompackage-lock.json.
Explicitly excluded:
samples/.outputs/..git/.- The development
.venv/. - Local generated PyInstaller
build/folders and.specfiles unless the implementation deliberately adds a stable project-owned spec file. - NVIDIA GPU drivers and CUDA Toolkit installers. The installer may check for a compatible NVIDIA driver through
nvidia-smi, but it should not redistribute GPU drivers in this sprint.
Touched Surfaces
Allowed during implementation:
- Create
packaging/offline/build-offline-payload.ps1. - Create
packaging/offline/verify-offline-payload.ps1. - Create
packaging/offline/install-runtime.ps1. - Create
packaging/offline/repair-runtime.ps1. - Create
packaging/offline/run-doctor.ps1. - Create
packaging/offline/Pdf2MdOffline.iss. - Create
packaging/offline/requirements-runtime-cu126.txt. - Create
packaging/offline/README.md. - Create
packaging/offline/THIRD_PARTY_NOTICES.md. - Create
src/pdf2md/packaging_manifest.pyonly if a Python helper is simpler than repeating manifest logic in PowerShell. - Modify
src/pdf2md_ui/runner.pyso the UI can resolve an installed target-local.venv\Scripts\pdf2md.exebefore falling back to PATH oruv run pdf2md. - Modify
src/pdf2md_ui/app.pyonly if the project root default must prefer the installed runtime folder. - Modify
tests/test_ui_runner.py. - Create
tests/test_offline_packaging.py. - Modify
README.md. - Modify
docs/V1RELEASECHECKLIST.md. - Modify
PLAN.md. - Modify
PROGRESS.md. - Modify
docs/WORKARCHIVE.mdafter implementation.
Not allowed:
- Do not change MinerU 3.1.0 as the fixed conversion engine.
- Do not add a second conversion engine.
- Do not add runtime network calls,
--api-url, router mode, remote APIs, HTTP client backends, remote OpenAI-compatible backends, or hosted renderers. - Do not copy the development
.venvas the installed runtime. - Do not make default tests depend on real MinerU, GPU, model files, network, Obsidian, MathJax, Inno Setup, or
samples/. - Do not commit generated installer payloads, model files, wheelhouse files, Python installers,
dist/,outputs/, orsamples/.
Architecture Plan
WP17.1: Offline Payload Builder
Add a build script that creates a clean staging folder under dist/offline-installer/ with app/, runtime/, and payload/ subfolders that mirror the final install layout.
Responsibilities:
- Rebuild
dist/pdf2md-ui.exe. - Build the project wheel into the staging wheelhouse.
- Download or collect Python wheels for the target runtime on a connected build PC.
- Collect the Windows Python runtime package and
uv.exe. - Copy project runtime files without
.git,.venv,outputs/,samples/, and build trash. - Copy local MinerU model files from a configured source path.
- Optionally copy portable Node.js and the locked
node_modules/. - Generate
payload-manifest.jsonandSHA256SUMS.txt. - Fail if any required file is missing or if any wheel dependency would require internet during installation.
The builder may use python -m pip download on the connected build PC. The target installer must use only local files, for example uv pip install --no-index --find-links.
WP17.2: Target Runtime Installer
Add a PowerShell install script that runs from the installed payload and creates the real runtime on the target PC.
Responsibilities:
- Verify payload hashes before installing.
- Install or locate Python 3.12 x64.
- Create
runtime\.venvon the target PC. - Install packages from
payload\wheelhousewith network disabled. - Install the project wheel into the target
.venv. - Preserve the bundled wheelhouse for offline repair.
- Configure
MINERU_MODEL_SOURCE=localfor UI/CLI child processes. - Configure local MinerU model paths without silently overwriting an unrelated user
mineru.json. - If
%USERPROFILE%\mineru.jsonalready exists and points elsewhere, prompt in interactive mode; in silent mode, fail clearly and leaverepair-runtime.ps1instructions. - Run
pdf2md doctorand write the result tologs\doctor-after-install.txt.
WP17.3: UI Runtime Resolution
Adjust the UI runner for an installed offline layout.
Resolution order:
- Explicit configured
pdf2mdcommand. - Installed runtime
.venv\Scripts\pdf2md.exeunder the selected project root. pdf2mdon PATH.- Bundled
uv.exeplusuv run --offline pdf2mdunder the selected project root. - Existing system
uv run pdf2mdfallback.
Child environment rules:
- Set
MINERU_MODEL_SOURCE=localunless explicitly set. - Add installed
.venv\Scriptsto PATH for runtime console scripts. - Add installed portable Node.js path to PATH when bundled.
- Set
UV_OFFLINE=1when using the installed offline runtime. - Do not add remote endpoints or backend flags.
WP17.4: Inno Setup Installer
Add an Inno Setup script that installs the payload and invokes the target runtime installer.
Installer behavior:
- Default to per-user install under
%LOCALAPPDATA%\Programs\ConvertPDFToMD. - Create Start Menu shortcuts for:
ConvertPDFToMDUI,PDF2MD Doctor,Repair PDF2MD Runtime.
- Run
install-runtime.ps1after files are copied. - Show the doctor log path if setup finishes with WARN.
- Fail the install on target runtime setup failure unless the user explicitly chooses to keep files for manual repair.
WP17.5: License, Manifest, And Offline Verification
Add docs and checks for redistribution risk.
Required records:
- Python, uv, PyInstaller, PyTorch, MinerU, model files, Node.js, MathJax, and transitive Python/npm dependency notices.
- A manifest with file hashes and source URLs.
- A clear statement that runtime conversion remains local-only and that setup payload creation can use internet only on the build PC.
Verification tiers:
- Fast tests use fake staging folders and fake wheel/model files.
- Build-PC packaging smoke can create the staging folder without committing payload.
- Offline target smoke uses a clean Windows VM with networking disabled.
Implementation Task Plan
Task 1: Packaging Manifest And Ignore Policy
Files:
- Create
tests/test_offline_packaging.py. - Create
src/pdf2md/packaging_manifest.pyif needed. - Modify
.gitignore.
Steps:
- Add failing tests for manifest generation with SHA-256, file size, relative path, and source label.
- Add failing tests that payload paths under
dist/offline-installer/, wheelhouse files, model files, and generated installer executables stay ignored. - Implement the smallest manifest helper or PowerShell-compatible JSON format.
- Run
uv run pytest tests/test_offline_packaging.py. - Commit manifest and ignore-policy changes.
Task 2: Offline Payload Builder
Files:
- Create
packaging/offline/build-offline-payload.ps1. - Create
packaging/offline/requirements-runtime-cu126.txt. - Create
packaging/offline/README.md. - Create
packaging/offline/verify-offline-payload.ps1. - Modify
tests/test_offline_packaging.py.
Steps:
- Add tests that the builder rejects missing UI exe, missing model source, missing Python runtime package, missing
uv.exe, and empty wheelhouse. - Add tests that the builder excludes
.venv,.git,samples,outputs,node_modulesunless explicitly copied as the optional locked MathJax payload. - Implement payload staging, manifest generation, and payload verification.
- Run
uv run pytest tests/test_offline_packaging.py. - Run a dry build command that uses fake payload inputs.
- Commit builder changes.
Task 3: Target Runtime Install And Repair Scripts
Files:
- Create
packaging/offline/install-runtime.ps1. - Create
packaging/offline/repair-runtime.ps1. - Create
packaging/offline/run-doctor.ps1. - Modify
tests/test_offline_packaging.py.
Steps:
- Add tests that scripts contain
--no-index,--find-links,UV_OFFLINE=1, and nohttp://orhttps://target-install commands. - Add tests that existing
mineru.jsonhandling is explicit and never silently overwritten. - Implement target-local
.venvcreation, offline package install, model config handling, doctor logging, and repair flow. - Run
uv run pytest tests/test_offline_packaging.py. - Commit install-script changes.
Task 4: UI Installed Runtime Resolution
Files:
- Modify
src/pdf2md_ui/runner.py. - Modify
src/pdf2md_ui/app.pyonly if needed. - Modify
tests/test_ui_runner.py.
Steps:
- Add failing tests for project-root
.venv\Scripts\pdf2md.exeresolution before PATH. - Add failing tests for bundled
uv.exeplusuv run --offline pdf2mdfallback. - Add failing tests that the child environment prepends
.venv\Scriptsand bundled Node.js when present. - Implement the minimal runner changes.
- Run
uv run pytest tests/test_ui_runner.py. - Commit UI resolution changes.
Task 5: Inno Setup Script
Files:
- Create
packaging/offline/Pdf2MdOffline.iss. - Modify
tests/test_offline_packaging.py.
Steps:
- Add tests that the Inno script references the expected payload directories, Start Menu shortcuts, and runtime install script.
- Add tests that the script does not reference
samples,outputs,.venv, or remote URLs. - Implement the Inno script.
- On a build PC with Inno Setup installed, run
ISCC.exe packaging\offline\Pdf2MdOffline.iss. - Commit installer-script changes without committing the generated installer.
Task 6: Documentation And Release Gate
Files:
- Modify
README.md. - Modify
docs/V1RELEASECHECKLIST.md. - Modify
docs/Sprints/SPRINT17CONTRACT.md. - Modify
PLAN.md. - Modify
PROGRESS.md. - Modify
docs/WORKARCHIVE.mdafter implementation.
Steps:
- Document build-PC prerequisites and target-PC prerequisites.
- Document the offline artifact layout, expected size risk, and repair flow.
- Document the clean offline VM smoke test.
- Record final verification outcomes and residual risks.
- Commit documentation and handoff updates.
Verification Commands
Default fast checks:
uv run pytest tests/test_offline_packaging.py tests/test_ui_runner.py
uv run pytest
git diff --check
git status --short --untracked-files=all
Build-PC packaging checks:
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
$pythonInstaller = "C:\BuildCache\python-3.12-amd64.exe"
$uvExe = "C:\BuildCache\uv.exe"
$mineruModels = "C:\BuildCache\mineru-models"
powershell -ExecutionPolicy Bypass -File packaging\offline\build-offline-payload.ps1 -Configuration Release -PythonInstaller $pythonInstaller -UvExe $uvExe -MinerUModelSource $mineruModels
powershell -ExecutionPolicy Bypass -File packaging\offline\verify-offline-payload.ps1 -PayloadRoot dist\offline-installer\payload
ISCC.exe packaging\offline\Pdf2MdOffline.iss
Offline target smoke:
# Run on a clean Windows x64 VM with networking disabled after copying only the installer.
.\Pdf2MdOfflineSetup-*.exe
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\scripts\run-doctor.ps1"
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" --version
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" doctor
Optional conversion smoke on the offline target:
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" convert C:\LocalTest\SolidElement.pdf --out C:\LocalTest\outputs --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local
Expected optional output:
C:\LocalTest\outputs\SolidElement\SolidElement_001.md
C:\LocalTest\outputs\SolidElement\SolidElement_report.md
C:\LocalTest\outputs\SolidElement\images\
Acceptance Criteria
- The generated installer can install the runtime on a clean Windows x64 target without internet access.
- The target runtime has a newly created local
.venv; it is not a copied development.venv. pdf2md --versionruns from the installed.venv.pdf2md doctorruns without network access and reports all install-relevant failures or warnings clearly.- The UI launches from the Start Menu and resolves the installed runtime without manual project-root configuration.
- MinerU uses local models through
MINERU_MODEL_SOURCE=localand local model config. - Python package installation uses only bundled local wheels.
- The wheelhouse and model payload are hash-verified before install.
- No generated payload, model file, wheel, installer exe, sample PDF, or conversion output is committed.
- Default tests remain fast and independent of real MinerU, GPU, model files, network, Inno Setup, MathJax, or
samples/.
Hard Failure Criteria
- The target installer downloads anything from the internet.
- The UI or CLI introduces a runtime document upload path.
- The installer silently overwrites an unrelated existing
mineru.json. - The installer copies the development
.venvas the installed runtime. - The installed UI cannot find
pdf2mdwithout manually editing settings on a clean install. pdf2md doctoris skipped or its failure is hidden.- Payload hash verification is missing.
- License/model redistribution review is skipped before sharing the installer outside the current personal environment.
- NVIDIA drivers or CUDA Toolkit installers are redistributed in this sprint.
Open Risks
- The final installer may be very large because CUDA PyTorch wheels, MinerU dependencies, model weights, and optional Node/MathJax assets are large.
- MinerU model redistribution terms and transitive package/model licenses must be reviewed before broader sharing.
- Target PCs still need compatible NVIDIA hardware and drivers. The installer can verify and report this, but it cannot guarantee GPU compatibility.
- Some conversions can still stall or run slowly on GTX 1070 Ti 8GB; packaging does not solve runtime performance.
- Inno Setup may need practical size and antivirus/SmartScreen validation once real model payloads are included.
Sources
- PyInstaller usage: https://pyinstaller.org/en/stable/usage.html
- Inno Setup command-line compiler: https://documentation.help/Inno-Setup/topic_compilercmdline.htm
- uv CLI
--offlinebehavior: https://docs.astral.sh/uv/reference/cli/ - uv cache behavior: https://docs.astral.sh/uv/concepts/cache/
- pip offline install/download behavior: https://pip.pypa.io/en/stable/cli/pip_install.html and https://pip.pypa.io/en/stable/cli/pip_download/
- PyTorch previous version wheel command for CUDA 12.6: https://pytorch.org/get-started/previous-versions/
- MinerU local model source behavior: https://opendatalab.github.io/MinerU/usage/model_source/
Handoff Requirements
After implementation:
- Update this contract status to
Implementedor record the failed gate. - Record payload size and generated installer path in
PROGRESS.md. - Record verification commands and outcomes in
PROGRESS.md. - Archive implementation evidence and offline VM smoke results in
docs/WORKARCHIVE.md. - Keep generated offline payloads, wheels, model files, installer exe,
dist/,outputs/, andsamples/uncommitted.