441 lines
18 KiB
Markdown
441 lines
18 KiB
Markdown
# Sprint 17 Contract: Offline Windows Installer
|
|
|
|
Status: Abandoned
|
|
Last updated: 2026-05-13
|
|
|
|
## Abandonment Note
|
|
|
|
Sprint 17 was abandoned at the user's request on 2026-05-13 before implementation began. This document remains as a historical planning record only. Do not implement or extend this contract unless the user explicitly reopens offline installer work.
|
|
|
|
## Objective
|
|
|
|
Create a large offline Windows installer that can install the existing local `pdf2md` runtime on another Windows PC without internet access.
|
|
|
|
The installer must install or stage all application-owned files needed after download time: the minimal UI executable, the project runtime, a target-local Python virtual environment created from bundled wheels, CUDA PyTorch wheels, MinerU 3.1.0 wheels and dependencies, local MinerU model files, optional local Node.js/MathJax assets, Start Menu shortcuts, setup logs, and a post-install `pdf2md doctor` verification path.
|
|
|
|
This sprint does not change conversion behavior. It packages the already implemented CLI/UI/runtime for offline use.
|
|
|
|
## Product Decision
|
|
|
|
The offline package should create the target PC virtual environment during installation instead of copying the current development `.venv`.
|
|
|
|
Reasoning:
|
|
|
|
- Python virtual environments and console entry points often contain absolute paths and are not a reliable redistribution unit.
|
|
- A target-local `.venv` created from a bundled wheelhouse is more reproducible and easier to repair.
|
|
- The installer can keep the wheelhouse for offline repair, uninstall/reinstall, and audit.
|
|
|
|
## Installer Shape
|
|
|
|
Recommended installer technology:
|
|
|
|
- Inno Setup for the Windows installer shell because it can compile scripts from the command line with `ISCC.exe`, returns deterministic exit codes, and is simple enough for a per-user installer.
|
|
- PowerShell scripts for payload build, target runtime install, and target verification.
|
|
- PyInstaller remains only the UI executable builder. It must not become the full MinerU/PyTorch/model bundler.
|
|
|
|
Default install root:
|
|
|
|
```text
|
|
%LOCALAPPDATA%\Programs\ConvertPDFToMD\
|
|
```
|
|
|
|
Installed layout:
|
|
|
|
```text
|
|
ConvertPDFToMD/
|
|
app/
|
|
pdf2md-ui.exe
|
|
runtime/
|
|
pyproject.toml
|
|
uv.lock
|
|
README.md
|
|
src/
|
|
tools/
|
|
package.json
|
|
package-lock.json
|
|
.venv/
|
|
payload/
|
|
python/
|
|
uv/
|
|
wheelhouse/
|
|
requirements-runtime-cu126.txt
|
|
models/
|
|
node/
|
|
node_modules/
|
|
payload-manifest.json
|
|
SHA256SUMS.txt
|
|
THIRD_PARTY_NOTICES.md
|
|
scripts/
|
|
install-runtime.ps1
|
|
repair-runtime.ps1
|
|
run-doctor.ps1
|
|
logs/
|
|
```
|
|
|
|
Generated artifacts that must remain untracked:
|
|
|
|
```text
|
|
dist/offline-installer/
|
|
dist/Pdf2MdOfflineSetup-*.exe
|
|
```
|
|
|
|
## Payload Contents
|
|
|
|
The first offline payload targets Windows x64, Python 3.12, CUDA PyTorch `2.6.0+cu126`, `torchvision 0.21.0+cu126`, and `mineru[core]==3.1.0`.
|
|
|
|
Required:
|
|
|
|
- `dist/pdf2md-ui.exe` from the existing PyInstaller build.
|
|
- Tracked project runtime files needed to run `uv run pdf2md`.
|
|
- A Windows x64 Python 3.12 installer or an equivalent approved Python runtime package.
|
|
- A Windows x64 `uv.exe`.
|
|
- A wheelhouse containing:
|
|
- the current project wheel,
|
|
- `pypdf`,
|
|
- `torch==2.6.0`,
|
|
- `torchvision==0.21.0`,
|
|
- `mineru[core]==3.1.0`,
|
|
- all transitive Python runtime dependencies.
|
|
- Local MinerU model files and the model config template needed for `MINERU_MODEL_SOURCE=local`.
|
|
- A manifest listing every payload file, size, SHA-256 hash, source URL or local source, and license family.
|
|
|
|
Optional but recommended:
|
|
|
|
- Portable local Node.js runtime.
|
|
- `node_modules/` containing the locked MathJax checker dependencies from `package-lock.json`.
|
|
|
|
Explicitly excluded:
|
|
|
|
- `samples/`.
|
|
- `outputs/`.
|
|
- `.git/`.
|
|
- The development `.venv/`.
|
|
- Local generated PyInstaller `build/` folders and `.spec` files unless the implementation deliberately adds a stable project-owned spec file.
|
|
- NVIDIA GPU drivers and CUDA Toolkit installers. The installer may check for a compatible NVIDIA driver through `nvidia-smi`, but it should not redistribute GPU drivers in this sprint.
|
|
|
|
## Touched Surfaces
|
|
|
|
Allowed during implementation:
|
|
|
|
- Create `packaging/offline/build-offline-payload.ps1`.
|
|
- Create `packaging/offline/verify-offline-payload.ps1`.
|
|
- Create `packaging/offline/install-runtime.ps1`.
|
|
- Create `packaging/offline/repair-runtime.ps1`.
|
|
- Create `packaging/offline/run-doctor.ps1`.
|
|
- Create `packaging/offline/Pdf2MdOffline.iss`.
|
|
- Create `packaging/offline/requirements-runtime-cu126.txt`.
|
|
- Create `packaging/offline/README.md`.
|
|
- Create `packaging/offline/THIRD_PARTY_NOTICES.md`.
|
|
- Create `src/pdf2md/packaging_manifest.py` only if a Python helper is simpler than repeating manifest logic in PowerShell.
|
|
- Modify `src/pdf2md_ui/runner.py` so the UI can resolve an installed target-local `.venv\Scripts\pdf2md.exe` before falling back to PATH or `uv run pdf2md`.
|
|
- Modify `src/pdf2md_ui/app.py` only if the project root default must prefer the installed runtime folder.
|
|
- Modify `tests/test_ui_runner.py`.
|
|
- Create `tests/test_offline_packaging.py`.
|
|
- Modify `README.md`.
|
|
- Modify `docs/V1RELEASECHECKLIST.md`.
|
|
- Modify `PLAN.md`.
|
|
- Modify `PROGRESS.md`.
|
|
- Modify `docs/WORKARCHIVE.md` after implementation.
|
|
|
|
Not allowed:
|
|
|
|
- Do not change MinerU 3.1.0 as the fixed conversion engine.
|
|
- Do not add a second conversion engine.
|
|
- Do not add runtime network calls, `--api-url`, router mode, remote APIs, HTTP client backends, remote OpenAI-compatible backends, or hosted renderers.
|
|
- Do not copy the development `.venv` as the installed runtime.
|
|
- Do not make default tests depend on real MinerU, GPU, model files, network, Obsidian, MathJax, Inno Setup, or `samples/`.
|
|
- Do not commit generated installer payloads, model files, wheelhouse files, Python installers, `dist/`, `outputs/`, or `samples/`.
|
|
|
|
## Architecture Plan
|
|
|
|
### WP17.1: Offline Payload Builder
|
|
|
|
Add a build script that creates a clean staging folder under `dist/offline-installer/` with `app/`, `runtime/`, and `payload/` subfolders that mirror the final install layout.
|
|
|
|
Responsibilities:
|
|
|
|
- Rebuild `dist/pdf2md-ui.exe`.
|
|
- Build the project wheel into the staging wheelhouse.
|
|
- Download or collect Python wheels for the target runtime on a connected build PC.
|
|
- Collect the Windows Python runtime package and `uv.exe`.
|
|
- Copy project runtime files without `.git`, `.venv`, `outputs/`, `samples/`, and build trash.
|
|
- Copy local MinerU model files from a configured source path.
|
|
- Optionally copy portable Node.js and the locked `node_modules/`.
|
|
- Generate `payload-manifest.json` and `SHA256SUMS.txt`.
|
|
- Fail if any required file is missing or if any wheel dependency would require internet during installation.
|
|
|
|
The builder may use `python -m pip download` on the connected build PC. The target installer must use only local files, for example `uv pip install --no-index --find-links`.
|
|
|
|
### WP17.2: Target Runtime Installer
|
|
|
|
Add a PowerShell install script that runs from the installed payload and creates the real runtime on the target PC.
|
|
|
|
Responsibilities:
|
|
|
|
- Verify payload hashes before installing.
|
|
- Install or locate Python 3.12 x64.
|
|
- Create `runtime\.venv` on the target PC.
|
|
- Install packages from `payload\wheelhouse` with network disabled.
|
|
- Install the project wheel into the target `.venv`.
|
|
- Preserve the bundled wheelhouse for offline repair.
|
|
- Configure `MINERU_MODEL_SOURCE=local` for UI/CLI child processes.
|
|
- Configure local MinerU model paths without silently overwriting an unrelated user `mineru.json`.
|
|
- If `%USERPROFILE%\mineru.json` already exists and points elsewhere, prompt in interactive mode; in silent mode, fail clearly and leave `repair-runtime.ps1` instructions.
|
|
- Run `pdf2md doctor` and write the result to `logs\doctor-after-install.txt`.
|
|
|
|
### WP17.3: UI Runtime Resolution
|
|
|
|
Adjust the UI runner for an installed offline layout.
|
|
|
|
Resolution order:
|
|
|
|
1. Explicit configured `pdf2md` command.
|
|
2. Installed runtime `.venv\Scripts\pdf2md.exe` under the selected project root.
|
|
3. `pdf2md` on PATH.
|
|
4. Bundled `uv.exe` plus `uv run --offline pdf2md` under the selected project root.
|
|
5. Existing system `uv run pdf2md` fallback.
|
|
|
|
Child environment rules:
|
|
|
|
- Set `MINERU_MODEL_SOURCE=local` unless explicitly set.
|
|
- Add installed `.venv\Scripts` to PATH for runtime console scripts.
|
|
- Add installed portable Node.js path to PATH when bundled.
|
|
- Set `UV_OFFLINE=1` when using the installed offline runtime.
|
|
- Do not add remote endpoints or backend flags.
|
|
|
|
### WP17.4: Inno Setup Installer
|
|
|
|
Add an Inno Setup script that installs the payload and invokes the target runtime installer.
|
|
|
|
Installer behavior:
|
|
|
|
- Default to per-user install under `%LOCALAPPDATA%\Programs\ConvertPDFToMD`.
|
|
- Create Start Menu shortcuts for:
|
|
- `ConvertPDFToMD` UI,
|
|
- `PDF2MD Doctor`,
|
|
- `Repair PDF2MD Runtime`.
|
|
- Run `install-runtime.ps1` after files are copied.
|
|
- Show the doctor log path if setup finishes with WARN.
|
|
- Fail the install on target runtime setup failure unless the user explicitly chooses to keep files for manual repair.
|
|
|
|
### WP17.5: License, Manifest, And Offline Verification
|
|
|
|
Add docs and checks for redistribution risk.
|
|
|
|
Required records:
|
|
|
|
- Python, uv, PyInstaller, PyTorch, MinerU, model files, Node.js, MathJax, and transitive Python/npm dependency notices.
|
|
- A manifest with file hashes and source URLs.
|
|
- A clear statement that runtime conversion remains local-only and that setup payload creation can use internet only on the build PC.
|
|
|
|
Verification tiers:
|
|
|
|
- Fast tests use fake staging folders and fake wheel/model files.
|
|
- Build-PC packaging smoke can create the staging folder without committing payload.
|
|
- Offline target smoke uses a clean Windows VM with networking disabled.
|
|
|
|
## Implementation Task Plan
|
|
|
|
### Task 1: Packaging Manifest And Ignore Policy
|
|
|
|
Files:
|
|
|
|
- Create `tests/test_offline_packaging.py`.
|
|
- Create `src/pdf2md/packaging_manifest.py` if needed.
|
|
- Modify `.gitignore`.
|
|
|
|
Steps:
|
|
|
|
- Add failing tests for manifest generation with SHA-256, file size, relative path, and source label.
|
|
- Add failing tests that payload paths under `dist/offline-installer/`, wheelhouse files, model files, and generated installer executables stay ignored.
|
|
- Implement the smallest manifest helper or PowerShell-compatible JSON format.
|
|
- Run `uv run pytest tests/test_offline_packaging.py`.
|
|
- Commit manifest and ignore-policy changes.
|
|
|
|
### Task 2: Offline Payload Builder
|
|
|
|
Files:
|
|
|
|
- Create `packaging/offline/build-offline-payload.ps1`.
|
|
- Create `packaging/offline/requirements-runtime-cu126.txt`.
|
|
- Create `packaging/offline/README.md`.
|
|
- Create `packaging/offline/verify-offline-payload.ps1`.
|
|
- Modify `tests/test_offline_packaging.py`.
|
|
|
|
Steps:
|
|
|
|
- Add tests that the builder rejects missing UI exe, missing model source, missing Python runtime package, missing `uv.exe`, and empty wheelhouse.
|
|
- Add tests that the builder excludes `.venv`, `.git`, `samples`, `outputs`, `node_modules` unless explicitly copied as the optional locked MathJax payload.
|
|
- Implement payload staging, manifest generation, and payload verification.
|
|
- Run `uv run pytest tests/test_offline_packaging.py`.
|
|
- Run a dry build command that uses fake payload inputs.
|
|
- Commit builder changes.
|
|
|
|
### Task 3: Target Runtime Install And Repair Scripts
|
|
|
|
Files:
|
|
|
|
- Create `packaging/offline/install-runtime.ps1`.
|
|
- Create `packaging/offline/repair-runtime.ps1`.
|
|
- Create `packaging/offline/run-doctor.ps1`.
|
|
- Modify `tests/test_offline_packaging.py`.
|
|
|
|
Steps:
|
|
|
|
- Add tests that scripts contain `--no-index`, `--find-links`, `UV_OFFLINE=1`, and no `http://` or `https://` target-install commands.
|
|
- Add tests that existing `mineru.json` handling is explicit and never silently overwritten.
|
|
- Implement target-local `.venv` creation, offline package install, model config handling, doctor logging, and repair flow.
|
|
- Run `uv run pytest tests/test_offline_packaging.py`.
|
|
- Commit install-script changes.
|
|
|
|
### Task 4: UI Installed Runtime Resolution
|
|
|
|
Files:
|
|
|
|
- Modify `src/pdf2md_ui/runner.py`.
|
|
- Modify `src/pdf2md_ui/app.py` only if needed.
|
|
- Modify `tests/test_ui_runner.py`.
|
|
|
|
Steps:
|
|
|
|
- Add failing tests for project-root `.venv\Scripts\pdf2md.exe` resolution before PATH.
|
|
- Add failing tests for bundled `uv.exe` plus `uv run --offline pdf2md` fallback.
|
|
- Add failing tests that the child environment prepends `.venv\Scripts` and bundled Node.js when present.
|
|
- Implement the minimal runner changes.
|
|
- Run `uv run pytest tests/test_ui_runner.py`.
|
|
- Commit UI resolution changes.
|
|
|
|
### Task 5: Inno Setup Script
|
|
|
|
Files:
|
|
|
|
- Create `packaging/offline/Pdf2MdOffline.iss`.
|
|
- Modify `tests/test_offline_packaging.py`.
|
|
|
|
Steps:
|
|
|
|
- Add tests that the Inno script references the expected payload directories, Start Menu shortcuts, and runtime install script.
|
|
- Add tests that the script does not reference `samples`, `outputs`, `.venv`, or remote URLs.
|
|
- Implement the Inno script.
|
|
- On a build PC with Inno Setup installed, run `ISCC.exe packaging\offline\Pdf2MdOffline.iss`.
|
|
- Commit installer-script changes without committing the generated installer.
|
|
|
|
### Task 6: Documentation And Release Gate
|
|
|
|
Files:
|
|
|
|
- Modify `README.md`.
|
|
- Modify `docs/V1RELEASECHECKLIST.md`.
|
|
- Modify `docs/Sprints/SPRINT17CONTRACT.md`.
|
|
- Modify `PLAN.md`.
|
|
- Modify `PROGRESS.md`.
|
|
- Modify `docs/WORKARCHIVE.md` after implementation.
|
|
|
|
Steps:
|
|
|
|
- Document build-PC prerequisites and target-PC prerequisites.
|
|
- Document the offline artifact layout, expected size risk, and repair flow.
|
|
- Document the clean offline VM smoke test.
|
|
- Record final verification outcomes and residual risks.
|
|
- Commit documentation and handoff updates.
|
|
|
|
## Verification Commands
|
|
|
|
Default fast checks:
|
|
|
|
```powershell
|
|
uv run pytest tests/test_offline_packaging.py tests/test_ui_runner.py
|
|
uv run pytest
|
|
git diff --check
|
|
git status --short --untracked-files=all
|
|
```
|
|
|
|
Build-PC packaging checks:
|
|
|
|
```powershell
|
|
uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py
|
|
$pythonInstaller = "C:\BuildCache\python-3.12-amd64.exe"
|
|
$uvExe = "C:\BuildCache\uv.exe"
|
|
$mineruModels = "C:\BuildCache\mineru-models"
|
|
powershell -ExecutionPolicy Bypass -File packaging\offline\build-offline-payload.ps1 -Configuration Release -PythonInstaller $pythonInstaller -UvExe $uvExe -MinerUModelSource $mineruModels
|
|
powershell -ExecutionPolicy Bypass -File packaging\offline\verify-offline-payload.ps1 -PayloadRoot dist\offline-installer\payload
|
|
ISCC.exe packaging\offline\Pdf2MdOffline.iss
|
|
```
|
|
|
|
Offline target smoke:
|
|
|
|
```powershell
|
|
# Run on a clean Windows x64 VM with networking disabled after copying only the installer.
|
|
.\Pdf2MdOfflineSetup-*.exe
|
|
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\scripts\run-doctor.ps1"
|
|
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" --version
|
|
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" doctor
|
|
```
|
|
|
|
Optional conversion smoke on the offline target:
|
|
|
|
```powershell
|
|
& "$env:LOCALAPPDATA\Programs\ConvertPDFToMD\runtime\.venv\Scripts\pdf2md.exe" convert C:\LocalTest\SolidElement.pdf --out C:\LocalTest\outputs --overwrite --chunk-pages --gpu auto --mineru-profile auto --strict-local
|
|
```
|
|
|
|
Expected optional output:
|
|
|
|
```text
|
|
C:\LocalTest\outputs\SolidElement\SolidElement_001.md
|
|
C:\LocalTest\outputs\SolidElement\SolidElement_report.md
|
|
C:\LocalTest\outputs\SolidElement\images\
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- The generated installer can install the runtime on a clean Windows x64 target without internet access.
|
|
- The target runtime has a newly created local `.venv`; it is not a copied development `.venv`.
|
|
- `pdf2md --version` runs from the installed `.venv`.
|
|
- `pdf2md doctor` runs without network access and reports all install-relevant failures or warnings clearly.
|
|
- The UI launches from the Start Menu and resolves the installed runtime without manual project-root configuration.
|
|
- MinerU uses local models through `MINERU_MODEL_SOURCE=local` and local model config.
|
|
- Python package installation uses only bundled local wheels.
|
|
- The wheelhouse and model payload are hash-verified before install.
|
|
- No generated payload, model file, wheel, installer exe, sample PDF, or conversion output is committed.
|
|
- Default tests remain fast and independent of real MinerU, GPU, model files, network, Inno Setup, MathJax, or `samples/`.
|
|
|
|
## Hard Failure Criteria
|
|
|
|
- The target installer downloads anything from the internet.
|
|
- The UI or CLI introduces a runtime document upload path.
|
|
- The installer silently overwrites an unrelated existing `mineru.json`.
|
|
- The installer copies the development `.venv` as the installed runtime.
|
|
- The installed UI cannot find `pdf2md` without manually editing settings on a clean install.
|
|
- `pdf2md doctor` is skipped or its failure is hidden.
|
|
- Payload hash verification is missing.
|
|
- License/model redistribution review is skipped before sharing the installer outside the current personal environment.
|
|
- NVIDIA drivers or CUDA Toolkit installers are redistributed in this sprint.
|
|
|
|
## Open Risks
|
|
|
|
- The final installer may be very large because CUDA PyTorch wheels, MinerU dependencies, model weights, and optional Node/MathJax assets are large.
|
|
- MinerU model redistribution terms and transitive package/model licenses must be reviewed before broader sharing.
|
|
- Target PCs still need compatible NVIDIA hardware and drivers. The installer can verify and report this, but it cannot guarantee GPU compatibility.
|
|
- Some conversions can still stall or run slowly on GTX 1070 Ti 8GB; packaging does not solve runtime performance.
|
|
- Inno Setup may need practical size and antivirus/SmartScreen validation once real model payloads are included.
|
|
|
|
## Sources
|
|
|
|
- PyInstaller usage: https://pyinstaller.org/en/stable/usage.html
|
|
- Inno Setup command-line compiler: https://documentation.help/Inno-Setup/topic_compilercmdline.htm
|
|
- uv CLI `--offline` behavior: https://docs.astral.sh/uv/reference/cli/
|
|
- uv cache behavior: https://docs.astral.sh/uv/concepts/cache/
|
|
- pip offline install/download behavior: https://pip.pypa.io/en/stable/cli/pip_install.html and https://pip.pypa.io/en/stable/cli/pip_download/
|
|
- PyTorch previous version wheel command for CUDA 12.6: https://pytorch.org/get-started/previous-versions/
|
|
- MinerU local model source behavior: https://opendatalab.github.io/MinerU/usage/model_source/
|
|
|
|
## Handoff Requirements
|
|
|
|
After implementation:
|
|
|
|
- Update this contract status to `Implemented` or record the failed gate.
|
|
- Record payload size and generated installer path in `PROGRESS.md`.
|
|
- Record verification commands and outcomes in `PROGRESS.md`.
|
|
- Archive implementation evidence and offline VM smoke results in `docs/WORKARCHIVE.md`.
|
|
- Keep generated offline payloads, wheels, model files, installer exe, `dist/`, `outputs/`, and `samples/` uncommitted.
|