1.2 KiB
1.2 KiB
name, description
| name | description |
|---|---|
| pdf-toolchain | Research and maintain PDFtoMD toolchain compatibility for Marker, Nougat, PyMuPDF, PyTorch/CUDA, model cache, and licensing. Use when Codex needs dependency pins, runtime compatibility checks, official-source research, or updates to docs/TOOLCHAIN.md and related ADRs. |
PDF Toolchain
Workflow
- Read
AGENTS.md,PLAN.md,PROGRESS.md,docs/TOOLCHAIN.md,docs/ARCHITECTURE.md, anddocs/ADR.md. - Prefer official or primary sources for current facts.
- Verify local facts with commands when relevant:
.\venv\python.exe -m pip check.\venv\python.exe -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())".\venv\Scripts\nougat.exe --help
- Preserve the verified GTX 1070 Ti baseline unless a replacement is tested.
- Update
docs/TOOLCHAIN.mdanddocs/ADR.mdwhen dependency decisions change.
Guardrails
- Do not upgrade
torch,transformers,albumentations,pypdfium2,opencv-python-headless,Pillow, orfsspecwithout re-running compatibility checks. - Do not switch the primary parser away from Marker without an ADR update.
- Do not download model weights unless the user explicitly asks.