# AGENTS.md This file gives implementation instructions for coding agents working in this repository. ## Project Mission Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets. ## Project Guidelines Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed. **Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment. ### 1. Think Before Coding **Don't assume. Don't hide confusion. Surface tradeoffs.** Before implementing: - State your assumptions explicitly. If uncertain, ask. - If multiple interpretations exist, present them - don't pick silently. - If a simpler approach exists, say so. Push back when warranted. - If something is unclear, stop. Name what's confusing. Ask. ### 2. Simplicity First **Minimum code that solves the problem. Nothing speculative.** - No features beyond what was asked. - No abstractions for single-use code. - No "flexibility" or "configurability" that wasn't requested. - No error handling for impossible scenarios. - If you write 200 lines and it could be 50, rewrite it. Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify. ### 3. Surgical Changes **Touch only what you must. Clean up only your own mess.** When editing existing code: - Don't "improve" adjacent code, comments, or formatting. - Don't refactor things that aren't broken. - Match existing style, even if you'd do it differently. - If you notice unrelated dead code, mention it - don't delete it. When your changes create orphans: - Remove imports/variables/functions that YOUR changes made unused. - Don't remove pre-existing dead code unless asked. The test: Every changed line should trace directly to the user's request. ### 4. Goal-Driven Execution **Define success criteria. Loop until verified.** Transform tasks into verifiable goals: - "Add validation" -> "Write tests for invalid inputs, then make them pass" - "Fix the bug" -> "Write a test that reproduces it, then make it pass" - "Refactor X" -> "Ensure tests pass before and after" For multi-step tasks, state a brief plan: ``` 1. [Step] -> verify: [check] 2. [Step] -> verify: [check] 3. [Step] -> verify: [check] ``` Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification. --- **These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes. ## Source Documents - `PLAN.md`: shared plan, planned work, open questions, and ownership for agents. - `PROGRESS.md`: completed work, current status, blockers, and next actions for agents. - `PRD.md`: product requirements, user scope, CLI/API requirements, acceptance criteria. - `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement. - `docs/KNOWLEDGEBASE.md`: research basis and implementation background. - `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership. - `docs/WORKARCHIVE.md`: archived completed work, historical sprint outcomes, setup results, verification history, and sample conversion evidence. - `docs/Sprints/*.md`: active and historical sprint contracts. - `.codex/agents/*.toml`: project-scoped custom subagent roles. - `.codex/commands/*.md`: reusable project prompt commands. - `.codex/skills/*/SKILL.md`: project-specific Codex skills. - `.codex/hooks.json` and `.codex/hooks/*.py`: project hook configuration and deterministic hook scripts. ## Startup Workflow At the start of every task: - Read `PLAN.md` and `PROGRESS.md` before deciding what to do. - Read only the other source documents needed for the task. - Read `docs/WORKARCHIVE.md` when the task needs historical completed-work context, previous verification results, or sample conversion evidence. - Use `.codex/agents`, `.codex/commands`, and `.codex/skills` when the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance. - State the relevant current goal, next action, and blocker if one exists. - If `PLAN.md` and `PROGRESS.md` conflict, trust `PROGRESS.md` for what has happened and update `PLAN.md` when making the next change. ## Progress Tracking Use `PLAN.md` and `PROGRESS.md` to coordinate work across agents. - Update `PLAN.md` when planned work, ownership, sequencing, open questions, or decisions change. - Update `PROGRESS.md` after meaningful work, verification, blockers, or next actions change. - Move completed work from `PROGRESS.md` to `docs/WORKARCHIVE.md` when it is no longer needed for current coordination. - Keep entries short and factual. - Do not use these files as scratchpads or long research notes. - Do not mark work complete until it has been verified. - When multiple agents work in parallel, each agent must leave enough context in `PROGRESS.md` for the next agent to resume without guessing. ## Long-Running Harness Workflow For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps. Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred. Harness roles: - `harness-planner-agent`: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts. - `feature-generator-agent`: implements one agreed contract at a time after implementation has been explicitly requested. - `evaluation-agent`: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold. Before each implementation chunk: - Write or update a concise sprint contract before code changes start. - Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields. - Let `evaluation-agent` review the contract before `feature-generator-agent` implements it. - Avoid over-specifying low-level implementation before the responsible agent has inspected the code. After each implementation chunk: - `feature-generator-agent` runs a self-check but does not approve its own work. - `evaluation-agent` performs independent checks against the contract and reports actionable findings. - If the chunk fails, feed the evaluator's findings back into the next generator pass. - Update `PROGRESS.md` with completed work, checks run, residual risks, and the next concrete action. When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action. Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality. ## Fixed Product Decisions - Language: Python. - Workflow: `uv`. - Interface: CLI plus Python library. - Default CLI name: `pdf2md`. - Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths. - Default output: Obsidian-friendly Markdown. - Inline math: `$...$`. - Display math: `$$...$$`. - Conversion engine: MinerU 3.1.0. - Hardware target: NVIDIA GPU. - Input priority: digital PDFs with text layers. - Quality workflow: fully automatic. Log warnings and continue when possible. - MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`. - Quality report: write both metadata JSON and `.report.md`. - v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution. ## Architecture Guidance Follow `ARCHITECTURE.md` for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file. Key implementation constraints: - Keep MinerU-specific objects behind the MinerU adapter. - Keep public CLI/library contracts stable and project-owned. - Keep Obsidian Markdown normalization separate from MinerU execution. - Keep metadata and warning generation structured. - Keep quality report generation derived from metadata and local checks. - Do not add runtime engine selection in v1. ## Local-Only Requirements Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in `ARCHITECTURE.md`. Allowed in v1: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process. Do not pass `--api-url`, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1. ## CLI Behavior Follow the CLI requirements in `PRD.md`. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them. Do not overwrite user files unless the requested behavior and `--overwrite` semantics allow it. ## Testing Guidance Add tests in proportion to behavior risk. Required early tests: - Math delimiter normalization. - Display math spacing. - Asset path normalization. - Metadata schema creation. - Warning aggregation. - CLI path planning and overwrite behavior. - MinerU adapter contract with mocked outputs. Fixture PDFs should cover: - Simple digital PDF. - Math-heavy academic PDF. - Multi-column paper. - Table with formulas. - Figure with caption. Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly. ## Git Workflow After changing files: - Run the smallest useful verification for the change. - Check `git status --short`. - Commit the completed change unless the user explicitly asks not to. - Do not include unrelated user edits in the commit. ## Documentation Guidance Keep documentation explicit about: - Local-only privacy behavior. - NVIDIA GPU expectations. - MinerU installation and model downloads. - Known limitations of automatic formula reconstruction. - Dependency licenses. - Obsidian output assumptions. Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.