3.9 KiB
3.9 KiB
Toolchain Notes
This document summarizes the researched toolchain choices and local compatibility decisions.
Verified Environment
- OS: Windows 10
- GPU: NVIDIA GeForce GTX 1070 Ti
- VRAM: 8 GB
- NVIDIA driver: 577.00
nvidia-smiCUDA runtime capability: 12.9- User-installed CUDA toolkit: 12.4
- Python: 3.11.15 in repo-local
venv - Environment manager: Conda / Miniforge
Python Dependencies
Use one repo-local venv and install from requirements.txt.
Key pins:
torch==2.7.1+cu126torchvision==0.22.1+cu126marker-pdf==1.10.2nougat-ocr==0.1.17transformers==4.57.6albumentations==1.3.1pymupdf==1.27.2.3pandas==3.0.2pytest==9.0.3pypdfium2==4.30.0opencv-python-headless==4.11.0.86Pillow==10.4.0fsspec==2026.2.0
PyTorch / CUDA Decision
torch==2.11.0+cu128imports on this machine but does not support GTX 1070 Tism_61at runtime.torch==2.7.1+cu126satisfies Markertorch>=2.7.0and successfully runs CUDA tensor operations on GTX 1070 Ti.- Keep this pin unless a newer official PyTorch wheel is verified to support
sm_61.
Marker
- Marker is the primary document parser.
- It handles layout, OCR/layout, reading order, body text, headings, tables, figures, captions, and semantic block roles.
- It should be consumed through structured output or adapter APIs where possible, not by scraping final Markdown text.
Nougat
- Nougat is used only for formulas and mathematical expressions.
nougat-ocr==0.1.17has loose dependency bounds, so the project pins compatible versions.transformers 5.xbreaks Nougat imports.albumentations 2.xbreaks Nougat transform initialization.- Nougat failure must fall back to Marker source text.
PyMuPDF
- PyMuPDF is used for lightweight page analysis, page counts, text-layer quality checks, OCR intervention planning, chunk planning, and low-level PDF/page operations.
- It is not the primary document parser.
Comparison Baselines
These tools are useful for research or quality comparison but are not the primary architecture:
- PyMuPDF4LLM
- Docling
- MinerU
- MarkItDown
Do not switch the primary parser without updating docs/ADR.md, docs/ARCHITECTURE.md, and docs/CONVERSION_POLICY.md.
Reference Links
- Marker PyPI: https://pypi.org/project/marker-pdf/
- Nougat GitHub: https://github.com/facebookresearch/nougat
- PyMuPDF documentation: https://pymupdf.readthedocs.io/
- PyTorch previous versions: https://docs.pytorch.org/get-started/previous-versions/
- GitHub Flavored Markdown spec: https://github.github.io/gfm/
- MathJax TeX delimiters: https://docs.mathjax.org/en/latest/input/tex/delimiters.html
- Docling GitHub: https://github.com/docling-project/docling
- MinerU GitHub: https://github.com/opendatalab/MinerU
Markdown And Math Rendering
- Markdown table output should target GitHub Flavored Markdown where possible.
- Complex tables may use limited HTML
<table>. - Math output uses
$ ... $for inline formulas and$$ ... $$for block formulas. $...$can conflict with ordinary dollar signs, so delimiter validation and repair are required.
Model Cache
- Use explicit local cache paths for Marker/Nougat/Hugging Face model downloads.
- README should include model pre-download and offline execution instructions before the engine is released.
- Default project-local model cache path is
.models/. PDFTOMD_MODEL_CACHEcan override the default cache root.- The runtime cache policy exposes Hugging Face cache environment variables from that root without downloading models during validation.
- Runtime logs and resume state are runtime artifacts under
output/.pdftomd-runtime/<document-slug>/, not generated document sidecars.
Licensing Notes
- Current user context is personal use.
- Before redistribution or commercial use, revisit Marker GPL and model-weight license implications.
- Process or API isolation can reduce coupling risk, but it is not a substitute for legal review.