Toolchain Notes

This document summarizes the researched toolchain choices and local compatibility decisions.

Verified Environment

Use one repo-local venv and install from requirements.txt.

Key pins:

torch==2.11.0+cu128 imports on this machine but does not support GTX 1070 Ti sm_61 at runtime.
torch==2.7.1+cu126 satisfies Marker torch>=2.7.0 and successfully runs CUDA tensor operations on GTX 1070 Ti.
Keep this pin unless a newer official PyTorch wheel is verified to support sm_61.

Marker is the primary document parser.
It handles layout, OCR/layout, reading order, body text, headings, tables, figures, captions, and semantic block roles.
It should be consumed through structured output or adapter APIs where possible, not by scraping final Markdown text.

Nougat is used only for formulas and mathematical expressions.
nougat-ocr==0.1.17 has loose dependency bounds, so the project pins compatible versions.
transformers 5.x breaks Nougat imports.
albumentations 2.x breaks Nougat transform initialization.
Nougat failure must fall back to Marker source text.

PyMuPDF is used for lightweight page analysis, page counts, text-layer quality checks, OCR intervention planning, chunk planning, and low-level PDF/page operations.
It is not the primary document parser.

These tools are useful for research or quality comparison but are not the primary architecture:

Do not switch the primary parser without updating docs/ADR.md, docs/ARCHITECTURE.md, and docs/CONVERSION_POLICY.md.

Markdown table output should target GitHub Flavored Markdown where possible.
Complex tables may use limited HTML <table>.
Math output uses $ ... $ for inline formulas and $$ ... $$ for block formulas.
$...$ can conflict with ordinary dollar signs, so delimiter validation and repair are required.

Use explicit local cache paths for Marker/Nougat/Hugging Face model downloads.
README should include model pre-download and offline execution instructions before the engine is released.
Default project-local model cache path is .models/.
PDFTOMD_MODEL_CACHE can override the default cache root.
The runtime cache policy exposes Hugging Face cache environment variables from that root without downloading models during validation.
Runtime logs and resume state are runtime artifacts under output/.pdftomd-runtime/<document-slug>/, not generated document sidecars.

Current user context is personal use.
Before redistribution or commercial use, revisit Marker GPL and model-weight license implications.
Process or API isolation can reduce coupling risk, but it is not a substitute for legal review.