# Toolchain Notes This document summarizes the researched toolchain choices and local compatibility decisions. ## Verified Environment - OS: Windows 10 - GPU: NVIDIA GeForce GTX 1070 Ti - VRAM: 8 GB - NVIDIA driver: 577.00 - `nvidia-smi` CUDA runtime capability: 12.9 - User-installed CUDA toolkit: 12.4 - Python: 3.11.15 in repo-local `venv` - Environment manager: Conda / Miniforge ## Python Dependencies Use one repo-local `venv` and install from `requirements.txt`. Key pins: - `torch==2.7.1+cu126` - `torchvision==0.22.1+cu126` - `marker-pdf==1.10.2` - `nougat-ocr==0.1.17` - `transformers==4.57.6` - `albumentations==1.3.1` - `pymupdf==1.27.2.3` - `pandas==3.0.2` - `pytest==9.0.3` - `pypdfium2==4.30.0` - `opencv-python-headless==4.11.0.86` - `Pillow==10.4.0` - `fsspec==2026.2.0` ## PyTorch / CUDA Decision - `torch==2.11.0+cu128` imports on this machine but does not support GTX 1070 Ti `sm_61` at runtime. - `torch==2.7.1+cu126` satisfies Marker `torch>=2.7.0` and successfully runs CUDA tensor operations on GTX 1070 Ti. - Keep this pin unless a newer official PyTorch wheel is verified to support `sm_61`. ## Marker - Marker is the primary document parser. - It handles layout, OCR/layout, reading order, body text, headings, tables, figures, captions, and semantic block roles. - It should be consumed through structured output or adapter APIs where possible, not by scraping final Markdown text. ## Nougat - Nougat is used only for formulas and mathematical expressions. - `nougat-ocr==0.1.17` has loose dependency bounds, so the project pins compatible versions. - `transformers 5.x` breaks Nougat imports. - `albumentations 2.x` breaks Nougat transform initialization. - Nougat failure must fall back to Marker source text. ## PyMuPDF - PyMuPDF is used for lightweight page analysis, page counts, text-layer quality checks, OCR intervention planning, chunk planning, and low-level PDF/page operations. - It is not the primary document parser. ## Comparison Baselines These tools are useful for research or quality comparison but are not the primary architecture: - PyMuPDF4LLM - Docling - MinerU - MarkItDown Do not switch the primary parser without updating `docs/ADR.md`, `docs/ARCHITECTURE.md`, and `docs/CONVERSION_POLICY.md`. ## Reference Links - Marker PyPI: https://pypi.org/project/marker-pdf/ - Nougat GitHub: https://github.com/facebookresearch/nougat - PyMuPDF documentation: https://pymupdf.readthedocs.io/ - PyTorch previous versions: https://docs.pytorch.org/get-started/previous-versions/ - GitHub Flavored Markdown spec: https://github.github.io/gfm/ - MathJax TeX delimiters: https://docs.mathjax.org/en/latest/input/tex/delimiters.html - Docling GitHub: https://github.com/docling-project/docling - MinerU GitHub: https://github.com/opendatalab/MinerU ## Markdown And Math Rendering - Markdown table output should target GitHub Flavored Markdown where possible. - Complex tables may use limited HTML `