10 KiB
AGENTS.md
This file gives implementation instructions for coding agents working in this repository.
Project Mission
Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets.
Project Guidelines
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.
1. Think Before Coding
Don't assume. Don't hide confusion. Surface tradeoffs.
Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
2. Simplicity First
Minimum code that solves the problem. Nothing speculative.
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
3. Surgical Changes
Touch only what you must. Clean up only your own mess.
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
4. Goal-Driven Execution
Define success criteria. Loop until verified.
Transform tasks into verifiable goals:
- "Add validation" -> "Write tests for invalid inputs, then make them pass"
- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
- "Refactor X" -> "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
1. [Step] -> verify: [check]
2. [Step] -> verify: [check]
3. [Step] -> verify: [check]
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
Source Documents
PLAN.md: shared plan, planned work, open questions, and ownership for agents.PROGRESS.md: completed work, current status, blockers, and next actions for agents.PRD.md: product requirements, user scope, CLI/API requirements, acceptance criteria.ARCHITECTURE.md: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.docs/KNOWLEDGEBASE.md: research basis and implementation background.docs/V1IMPLEMENTATIONPLAN.md: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.docs/Sprints/*.md: active and historical sprint contracts..codex/agents/*.toml: project-scoped custom subagent roles..codex/commands/*.md: reusable project prompt commands..codex/skills/*/SKILL.md: project-specific Codex skills..codex/hooks.jsonand.codex/hooks/*.py: project hook configuration and deterministic hook scripts.
Startup Workflow
At the start of every task:
- Read
PLAN.mdandPROGRESS.mdbefore deciding what to do. - Read only the other source documents needed for the task.
- Use
.codex/agents,.codex/commands, and.codex/skillswhen the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance. - State the relevant current goal, next action, and blocker if one exists.
- If
PLAN.mdandPROGRESS.mdconflict, trustPROGRESS.mdfor what has happened and updatePLAN.mdwhen making the next change.
Progress Tracking
Use PLAN.md and PROGRESS.md to coordinate work across agents.
- Update
PLAN.mdwhen planned work, ownership, sequencing, open questions, or decisions change. - Update
PROGRESS.mdafter meaningful work, verification, blockers, or next actions change. - Keep entries short and factual.
- Do not use these files as scratchpads or long research notes.
- Do not mark work complete until it has been verified.
- When multiple agents work in parallel, each agent must leave enough context in
PROGRESS.mdfor the next agent to resume without guessing.
Long-Running Harness Workflow
For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps.
Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred.
Harness roles:
harness-planner-agent: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts.feature-generator-agent: implements one agreed contract at a time after implementation has been explicitly requested.evaluation-agent: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold.
Before each implementation chunk:
- Write or update a concise sprint contract before code changes start.
- Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields.
- Let
evaluation-agentreview the contract beforefeature-generator-agentimplements it. - Avoid over-specifying low-level implementation before the responsible agent has inspected the code.
After each implementation chunk:
feature-generator-agentruns a self-check but does not approve its own work.evaluation-agentperforms independent checks against the contract and reports actionable findings.- If the chunk fails, feed the evaluator's findings back into the next generator pass.
- Update
PROGRESS.mdwith completed work, checks run, residual risks, and the next concrete action.
When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action.
Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality.
Fixed Product Decisions
- Language: Python.
- Workflow:
uv. - Interface: CLI plus Python library.
- Default CLI name:
pdf2md. - Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths.
- Default output: Obsidian-friendly Markdown.
- Inline math:
$...$. - Display math:
$$...$$. - Conversion engine: MinerU 3.1.0.
- Hardware target: NVIDIA GPU.
- Input priority: digital PDFs with text layers.
- Quality workflow: fully automatic. Log warnings and continue when possible.
- MinerU execution: direct local
mineruCLI only. MinerU 3.1.0 may launch a temporary localmineru-apiinternally when CLI runs without--api-url. - Quality report: write both metadata JSON and
<stem>.report.md. - v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
Architecture Guidance
Follow ARCHITECTURE.md for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file.
Key implementation constraints:
- Keep MinerU-specific objects behind the MinerU adapter.
- Keep public CLI/library contracts stable and project-owned.
- Keep Obsidian Markdown normalization separate from MinerU execution.
- Keep metadata and warning generation structured.
- Keep quality report generation derived from metadata and local checks.
- Do not add runtime engine selection in v1.
Local-Only Requirements
Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in ARCHITECTURE.md.
Allowed in v1: direct mineru CLI execution and the CLI-internal temporary local mineru-api process.
Do not pass --api-url, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1.
CLI Behavior
Follow the CLI requirements in PRD.md. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them.
Do not overwrite user files unless the requested behavior and --overwrite semantics allow it.
Testing Guidance
Add tests in proportion to behavior risk.
Required early tests:
- Math delimiter normalization.
- Display math spacing.
- Asset path normalization.
- Metadata schema creation.
- Warning aggregation.
- CLI path planning and overwrite behavior.
- MinerU adapter contract with mocked outputs.
Fixture PDFs should cover:
- Simple digital PDF.
- Math-heavy academic PDF.
- Multi-column paper.
- Table with formulas.
- Figure with caption.
Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly.
Git Workflow
After changing files:
- Run the smallest useful verification for the change.
- Check
git status --short. - Commit the completed change unless the user explicitly asks not to.
- Do not include unrelated user edits in the commit.
Documentation Guidance
Keep documentation explicit about:
- Local-only privacy behavior.
- NVIDIA GPU expectations.
- MinerU installation and model downloads.
- Known limitations of automatic formula reconstruction.
- Dependency licenses.
- Obsidian output assumptions.
Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.