Files
PDFToMD/docs/Sprints/SPRINT17CONTRACT.md
T
2026-05-14 10:16:59 +09:00

18 KiB

Sprint 17 Contract: Offline Windows Installer

Status: Abandoned Last updated: 2026-05-13

Abandonment Note

Sprint 17 was abandoned at the user's request on 2026-05-13 before implementation began. This document remains as a historical planning record only. Do not implement or extend this contract unless the user explicitly reopens offline installer work.

Objective

Create a large offline Windows installer that can install the existing local pdf2md runtime on another Windows PC without internet access.

The installer must install or stage all application-owned files needed after download time: the minimal UI executable, the project runtime, a target-local Python virtual environment created from bundled wheels, CUDA PyTorch wheels, MinerU 3.1.0 wheels and dependencies, local MinerU model files, optional local Node.js/MathJax assets, Start Menu shortcuts, setup logs, and a post-install pdf2md doctor verification path.

This sprint does not change conversion behavior. It packages the already implemented CLI/UI/runtime for offline use.

Product Decision

The offline package should create the target PC virtual environment during installation instead of copying the current development .venv.

Reasoning:

  • Python virtual environments and console entry points often contain absolute paths and are not a reliable redistribution unit.
  • A target-local .venv created from a bundled wheelhouse is more reproducible and easier to repair.
  • The installer can keep the wheelhouse for offline repair, uninstall/reinstall, and audit.

Installer Shape

Recommended installer technology:

  • Inno Setup for the Windows installer shell because it can compile scripts from the command line with ISCC.exe, returns deterministic exit codes, and is simple enough for a per-user installer.
  • PowerShell scripts for payload build, target runtime install, and target verification.
  • PyInstaller remains only the UI executable builder. It must not become the full MinerU/PyTorch/model bundler.

Default install root:

%LOCALAPPDATA%\Programs\ConvertPDFToMD\

Installed layout:

ConvertPDFToMD/
  app/
    pdf2md-ui.exe
  runtime/
    pyproject.toml
    uv.lock
    README.md
    src/
    tools/
    package.json
    package-lock.json
    .venv/
  payload/
    python/
    uv/
    wheelhouse/
    requirements-runtime-cu126.txt
    models/
    node/
    node_modules/
    payload-manifest.json
    SHA256SUMS.txt
    THIRD_PARTY_NOTICES.md
  scripts/
    install-runtime.ps1
    repair-runtime.ps1
    run-doctor.ps1
  logs/

Generated artifacts that must remain untracked:

dist/offline-installer/
dist/Pdf2MdOfflineSetup-*.exe

Payload Contents

The first offline payload targets Windows x64, Python 3.12, CUDA PyTorch 2.6.0+cu126, torchvision 0.21.0+cu126, and mineru[core]==3.1.0.

Required:

  • dist/pdf2md-ui.exe from the existing PyInstaller build.
  • Tracked project runtime files needed to run uv run pdf2md.
  • A Windows x64 Python 3.12 installer or an equivalent approved Python runtime package.
  • A Windows x64 uv.exe.
  • A wheelhouse containing:
    • the current project wheel,
    • pypdf,
    • torch==2.6.0,
    • torchvision==0.21.0,
    • mineru[core]==3.1.0,
    • all transitive Python runtime dependencies.
  • Local MinerU model files and the model config template needed for MINERU_MODEL_SOURCE=local.
  • A manifest listing every payload file, size, SHA-256 hash, source URL or local source, and license family.

Optional but recommended:

  • Portable local Node.js runtime.
  • node_modules/ containing the locked MathJax checker dependencies from package-lock.json.

Explicitly excluded:

  • samples/.
  • outputs/.
  • .git/.
  • The development .venv/.
  • Local generated PyInstaller build/ folders and .spec files unless the implementation deliberately adds a stable project-owned spec file.
  • NVIDIA GPU drivers and CUDA Toolkit installers. The installer may check for a compatible NVIDIA driver through nvidia-smi, but it should not redistribute GPU drivers in this sprint.

Touched Surfaces

Allowed during implementation:

  • Create packaging/offline/build-offline-payload.ps1.
  • Create packaging/offline/verify-offline-payload.ps1.
  • Create packaging/offline/install-runtime.ps1.
  • Create packaging/offline/repair-runtime.ps1.
  • Create packaging/offline/run-doctor.ps1.
  • Create packaging/offline/Pdf2MdOffline.iss.
  • Create packaging/offline/requirements-runtime-cu126.txt.
  • Create packaging/offline/README.md.
  • Create packaging/offline/THIRD_PARTY_NOTICES.md.
  • Create src/pdf2md/packaging_manifest.py only if a Python helper is simpler than repeating manifest logic in PowerShell.
  • Modify src/pdf2md_ui/runner.py so the UI can resolve an installed target-local .venv\Scripts\pdf2md.exe before falling back to PATH or uv run pdf2md.
  • Modify src/pdf2md_ui/app.py only if the project root default must prefer the installed runtime folder.
  • Modify tests/test_ui_runner.py.
  • Create tests/test_offline_packaging.py.
  • Modify README.md.
  • Modify docs/V1RELEASECHECKLIST.md.
  • Modify PLAN.md.
  • Modify PROGRESS.md.
  • Modify docs/WORKARCHIVE.md after implementation.

Not allowed:

  • Do not change MinerU 3.1.0 as the fixed conversion engine.
  • Do not add a second conversion engine.
  • Do not add runtime network calls, --api-url, router mode, remote APIs, HTTP client backends, remote OpenAI-compatible backends, or hosted renderers.
  • Do not copy the development .venv as the installed runtime.
  • Do not make default tests depend on real MinerU, GPU, model files, network, Obsidian, MathJax, Inno Setup, or samples/.
  • Do not commit generated installer payloads, model files, wheelhouse files, Python installers, dist/, outputs/, or samples/.

Architecture Plan

WP17.1: Offline Payload Builder

Add a build script that creates a clean staging folder under dist/offline-installer/ with app/, runtime/, and payload/ subfolders that mirror the final install layout.

Responsibilities:

  • Rebuild dist/pdf2md-ui.exe.
  • Build the project wheel into the staging wheelhouse.
  • Download or collect Python wheels for the target runtime on a connected build PC.
  • Collect the Windows Python runtime package and uv.exe.
  • Copy project runtime files without .git, .venv, outputs/, samples/, and build trash.
  • Copy local MinerU model files from a configured source path.
  • Optionally copy portable Node.js and the locked node_modules/.
  • Generate payload-manifest.json and SHA256SUMS.txt.
  • Fail if any required file is missing or if any wheel dependency would require internet during installation.

The builder may use python -m pip download on the connected build PC. The target installer must use only local files, for example uv pip install --no-index --find-links.

WP17.2: Target Runtime Installer

Add a PowerShell install script that runs from the installed payload and creates the real runtime on the target PC.

Responsibilities:

  • Verify payload hashes before installing.
  • Install or locate Python 3.12 x64.
  • Create runtime\.venv on the target PC.
  • Install packages from payload\wheelhouse with network disabled.
  • Install the project wheel into the target .venv.
  • Preserve the bundled wheelhouse for offline repair.
  • Configure MINERU_MODEL_SOURCE=local for UI/CLI child processes.
  • Configure local MinerU model paths without silently overwriting an unrelated user mineru.json.
  • If %USERPROFILE%\mineru.json already exists and points elsewhere, prompt in interactive mode; in silent mode, fail clearly and leave repair-runtime.ps1 instructions.
  • Run pdf2md doctor and write the result to logs\doctor-after-install.txt.

WP17.3: UI Runtime Resolution

Adjust the UI runner for an installed offline layout.

Resolution order:

  1. Explicit configured pdf2md command.
  2. Installed runtime .venv\Scripts\pdf2md.exe under the selected project root.
  3. pdf2md on PATH.
  4. Bundled uv.exe plus uv run --offline pdf2md under the selected project root.
  5. Existing system uv run pdf2md fallback.

Child environment rules:

  • Set MINERU_MODEL_SOURCE=local unless explicitly set.
  • Add installed .venv\Scripts to PATH for runtime console scripts.
  • Add installed portable Node.js path to PATH when bundled.
  • Set UV_OFFLINE=1 when using the installed offline runtime.
  • Do not add remote endpoints or backend flags.

WP17.4: Inno Setup Installer

Add an Inno Setup script that installs the payload and invokes the target runtime installer.

Installer behavior:

  • Default to per-user install under %LOCALAPPDATA%\Programs\ConvertPDFToMD.
  • Create Start Menu shortcuts for:
    • ConvertPDFToMD UI,
    • PDF2MD Doctor,
    • Repair PDF2MD Runtime.
  • Run install-runtime.ps1 after files are copied.
  • Show the doctor log path if setup finishes with WARN.
  • Fail the install on target runtime setup failure unless the user explicitly chooses to keep files for manual repair.

WP17.5: License, Manifest, And Offline Verification

Add docs and checks for redistribution risk.

Required records:

  • Python, uv, PyInstaller, PyTorch, MinerU, model files, Node.js, MathJax, and transitive Python/npm dependency notices.
  • A manifest with file hashes and source URLs.
  • A clear statement that runtime conversion remains local-only and that setup payload creation can use internet only on the build PC.

Verification tiers:

  • Fast tests use fake staging folders and fake wheel/model files.
  • Build-PC packaging smoke can create the staging folder without committing payload.
  • Offline target smoke uses a clean Windows VM with networking disabled.

Implementation Task Plan

Task 1: Packaging Manifest And Ignore Policy

Files:

  • Create tests/test_offline_packaging.py.
  • Create src/pdf2md/packaging_manifest.py if needed.
  • Modify .gitignore.

Steps:

  • Add failing tests for manifest generation with SHA-256, file size, relative path, and source label.
  • Add failing tests that payload paths under dist/offline-installer/, wheelhouse files, model files, and generated installer executables stay ignored.
  • Implement the smallest manifest helper or PowerShell-compatible JSON format.
  • Run uv run pytest tests/test_offline_packaging.py.
  • Commit manifest and ignore-policy changes.

Task 2: Offline Payload Builder

Files:

  • Create packaging/offline/build-offline-payload.ps1.
  • Create packaging/offline/requirements-runtime-cu126.txt.
  • Create packaging/offline/README.md.
  • Create packaging/offline/verify-offline-payload.ps1.
  • Modify tests/test_offline_packaging.py.

Steps:

  • Add tests that the builder rejects missing UI exe, missing model source, missing Python runtime package, missing uv.exe, and empty wheelhouse.
  • Add tests that the builder excludes .venv, .git, samples, outputs, node_modules unless explicitly copied as the optional locked MathJax payload.
  • Implement payload staging, manifest generation, and payload verification.
  • Run uv run pytest tests/test_offline_packaging.py.
  • Run a dry build command that uses fake payload inputs.
  • Commit builder changes.

Task 3: Target Runtime Install And Repair Scripts

Files:

  • Create packaging/offline/install-runtime.ps1.
  • Create packaging/offline/repair-runtime.ps1.
  • Create packaging/offline/run-doctor.ps1.
  • Modify tests/test_offline_packaging.py.

Steps:

  • Add tests that scripts contain --no-index, --find-links, UV_OFFLINE=1, and no http:// or https:// target-install commands.
  • Add tests that existing mineru.json handling is explicit and never silently overwritten.
  • Implement target-local .venv creation, offline package install, model config handling, doctor logging, and repair flow.
  • Run uv run pytest tests/test_offline_packaging.py.
  • Commit install-script changes.

Task 4: UI Installed Runtime Resolution

Files:

  • Modify src/pdf2md_ui/runner.py.
  • Modify src/pdf2md_ui/app.py only if needed.
  • Modify tests/test_ui_runner.py.

Steps:

  • Add failing tests for project-root .venv\Scripts\pdf2md.exe resolution before PATH.
  • Add failing tests for bundled uv.exe plus uv run --offline pdf2md fallback.
  • Add failing tests that the child environment prepends .venv\Scripts and bundled Node.js when present.
  • Implement the minimal runner changes.
  • Run uv run pytest tests/test_ui_runner.py.
  • Commit UI resolution changes.

Task 5: Inno Setup Script

Files:

  • Create packaging/offline/Pdf2MdOffline.iss.
  • Modify tests/test_offline_packaging.py.

Steps:

  • Add tests that the Inno script references the expected payload directories, Start Menu shortcuts, and runtime install script.
  • Add tests that the script does not reference samples, outputs, .venv, or remote URLs.
  • Implement the Inno script.
  • On a build PC with Inno Setup installed, run ISCC.exe packaging\offline\Pdf2MdOffline.iss.
  • Commit installer-script changes without committing the generated installer.

Task 6: Documentation And Release Gate

Files:

  • Modify README.md.
  • Modify docs/V1RELEASECHECKLIST.md.
  • Modify docs/Sprints/SPRINT17CONTRACT.md.
  • Modify PLAN.md.
  • Modify PROGRESS.md.
  • Modify docs/WORKARCHIVE.md after implementation.

Steps:

  • Document build-PC prerequisites and target-PC prerequisites.
  • Document the offline artifact layout, expected size risk, and repair flow.
  • Document the clean offline VM smoke test.
  • Record final verification outcomes and residual risks.
  • Commit documentation and handoff updates.

Verification Commands

Default fast checks:

uv run pytest tests/test_offline_packaging.py tests/test_ui_runner.py
uv run pytest
git diff --check
git status --short --untracked-files=all

Build-PC packaging checks:

uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
$pythonInstaller = "C:\BuildCache\python-3.12-amd64.exe"
$uvExe = "C:\BuildCache\uv.exe"
$mineruModels = "C:\BuildCache\mineru-models"
powershell -ExecutionPolicy Bypass -File packaging\offline\build-offline-payload.ps1 -Configuration Release -PythonInstaller $pythonInstaller -UvExe $uvExe -MinerUModelSource $mineruModels
powershell -ExecutionPolicy Bypass -File packaging\offline\verify-offline-payload.ps1 -PayloadRoot dist\offline-installer\payload
ISCC.exe packaging\offline\Pdf2MdOffline.iss

Offline target smoke:

# Run on a clean Windows x64 VM with networking disabled after copying only the installer.
.\Pdf2MdOfflineSetup-*.exe
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\scripts\run-doctor.ps1"
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" --version
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" doctor

Optional conversion smoke on the offline target:

& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" convert C:\LocalTest\SolidElement.pdf --out C:\LocalTest\outputs --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local

Expected optional output:

C:\LocalTest\outputs\SolidElement\SolidElement_001.md
C:\LocalTest\outputs\SolidElement\SolidElement_report.md
C:\LocalTest\outputs\SolidElement\images\

Acceptance Criteria

  • The generated installer can install the runtime on a clean Windows x64 target without internet access.
  • The target runtime has a newly created local .venv; it is not a copied development .venv.
  • pdf2md --version runs from the installed .venv.
  • pdf2md doctor runs without network access and reports all install-relevant failures or warnings clearly.
  • The UI launches from the Start Menu and resolves the installed runtime without manual project-root configuration.
  • MinerU uses local models through MINERU_MODEL_SOURCE=local and local model config.
  • Python package installation uses only bundled local wheels.
  • The wheelhouse and model payload are hash-verified before install.
  • No generated payload, model file, wheel, installer exe, sample PDF, or conversion output is committed.
  • Default tests remain fast and independent of real MinerU, GPU, model files, network, Inno Setup, MathJax, or samples/.

Hard Failure Criteria

  • The target installer downloads anything from the internet.
  • The UI or CLI introduces a runtime document upload path.
  • The installer silently overwrites an unrelated existing mineru.json.
  • The installer copies the development .venv as the installed runtime.
  • The installed UI cannot find pdf2md without manually editing settings on a clean install.
  • pdf2md doctor is skipped or its failure is hidden.
  • Payload hash verification is missing.
  • License/model redistribution review is skipped before sharing the installer outside the current personal environment.
  • NVIDIA drivers or CUDA Toolkit installers are redistributed in this sprint.

Open Risks

  • The final installer may be very large because CUDA PyTorch wheels, MinerU dependencies, model weights, and optional Node/MathJax assets are large.
  • MinerU model redistribution terms and transitive package/model licenses must be reviewed before broader sharing.
  • Target PCs still need compatible NVIDIA hardware and drivers. The installer can verify and report this, but it cannot guarantee GPU compatibility.
  • Some conversions can still stall or run slowly on GTX 1070 Ti 8GB; packaging does not solve runtime performance.
  • Inno Setup may need practical size and antivirus/SmartScreen validation once real model payloads are included.

Sources

Handoff Requirements

After implementation:

  • Update this contract status to Implemented or record the failed gate.
  • Record payload size and generated installer path in PROGRESS.md.
  • Record verification commands and outcomes in PROGRESS.md.
  • Archive implementation evidence and offline VM smoke results in docs/WORKARCHIVE.md.
  • Keep generated offline payloads, wheels, model files, installer exe, dist/, outputs/, and samples/ uncommitted.