Files
PDFToMD/AGENTS.md
T
2026-05-14 10:16:59 +09:00

249 lines
12 KiB
Markdown

# AGENTS.md
This file gives implementation instructions for coding agents working in this repository.
## Project Mission
Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets.
## Project Guidelines
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
### 1. Think Before Coding
**Don't assume. Don't hide confusion. Surface tradeoffs.**
Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
### 2. Simplicity First
**Minimum code that solves the problem. Nothing speculative.**
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
### 3. Surgical Changes
**Touch only what you must. Clean up only your own mess.**
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
### 4. Goal-Driven Execution
**Define success criteria. Loop until verified.**
Transform tasks into verifiable goals:
- "Add validation" -> "Write tests for invalid inputs, then make them pass"
- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
- "Refactor X" -> "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
```
1. [Step] -> verify: [check]
2. [Step] -> verify: [check]
3. [Step] -> verify: [check]
```
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
---
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
## Commands
| Command | Description |
| --- | --- |
| `uv run pytest` | Run the default fast test suite. |
| `uv run pdf2md doctor` | Check local Python, uv, MinerU, GPU/PyTorch, model/cache, MathJax, and strict-local setup. |
| `uv run pytest tests/test_ui_runner.py` | Run focused UI command-resolution and subprocess tests. |
| `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py` | Rebuild the thin Windows UI executable. |
| `uv run pdf2md convert paper.pdf --out outputs --chunk-pages --gpu auto --mineru-profile auto --strict-local` | Optional local conversion smoke; keep generated output ignored. |
## Source Documents
- `PLAN.md`: shared plan, planned work, open questions, and ownership for agents.
- `PROGRESS.md`: completed work, current status, blockers, and next actions for agents.
- `PRD.md`: product requirements, user scope, CLI/API requirements, acceptance criteria.
- `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.
- `docs/KNOWLEDGEBASE.md`: research basis and implementation background.
- `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.
- `docs/UI_RESEARCH.md`: research basis for the implemented minimal Windows UI launcher.
- `docs/WORKARCHIVE.md`: archived completed work, historical sprint outcomes, setup results, verification history, and sample conversion evidence.
- `docs/Sprints/*.md`: active and historical sprint contracts.
- `docs/superpowers/specs/*.md`: design specs created for focused project workflows.
- `docs/superpowers/plans/*.md`: executable task plans created from specs, including completed UI folder batch work and abandoned historical plans.
- `.codex/agents/*.toml`: project-scoped custom subagent roles.
- `.codex/commands/*.md`: reusable project prompt commands.
- `.codex/skills/*/SKILL.md`: project-specific Codex skills.
- `.codex/hooks.json` and `.codex/hooks/*.py`: project hook configuration and deterministic hook scripts.
## Startup Workflow
At the start of every task:
- Read `PLAN.md` and `PROGRESS.md` before deciding what to do.
- Read only the other source documents needed for the task.
- Read `docs/WORKARCHIVE.md` when the task needs historical completed-work context, previous verification results, or sample conversion evidence.
- Use `.codex/agents`, `.codex/commands`, and `.codex/skills` when the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance.
- State the relevant current goal, next action, and blocker if one exists.
- If `PLAN.md` and `PROGRESS.md` conflict, trust `PROGRESS.md` for what has happened and update `PLAN.md` when making the next change.
## Progress Tracking
Use `PLAN.md` and `PROGRESS.md` to coordinate work across agents.
- Update `PLAN.md` when planned work, ownership, sequencing, open questions, or decisions change.
- Update `PROGRESS.md` after meaningful work, verification, blockers, or next actions change.
- Move completed work from `PROGRESS.md` to `docs/WORKARCHIVE.md` when it is no longer needed for current coordination.
- Keep entries short and factual.
- Do not use these files as scratchpads or long research notes.
- Do not mark work complete until it has been verified.
- When multiple agents work in parallel, each agent must leave enough context in `PROGRESS.md` for the next agent to resume without guessing.
## Long-Running Harness Workflow
For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps.
Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred.
Harness roles:
- `harness-planner-agent`: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts.
- `feature-generator-agent`: implements one agreed contract at a time after implementation has been explicitly requested.
- `evaluation-agent`: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold.
Before each implementation chunk:
- Write or update a concise sprint contract before code changes start.
- Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields.
- Let `evaluation-agent` review the contract before `feature-generator-agent` implements it.
- Avoid over-specifying low-level implementation before the responsible agent has inspected the code.
After each implementation chunk:
- `feature-generator-agent` runs a self-check but does not approve its own work.
- `evaluation-agent` performs independent checks against the contract and reports actionable findings.
- If the chunk fails, feed the evaluator's findings back into the next generator pass.
- Update `PROGRESS.md` with completed work, checks run, residual risks, and the next concrete action.
When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action.
Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality.
## Fixed Product Decisions
- Language: Python.
- Workflow: `uv`.
- Interface: CLI plus Python library.
- Default CLI name: `pdf2md`.
- Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths.
- Default output: Obsidian-friendly Markdown.
- Inline math: `$...$`.
- Display math: `$$...$$`.
- Conversion engine: MinerU 3.1.0.
- Hardware target: NVIDIA GPU.
- Input priority: digital PDFs with text layers.
- Quality workflow: fully automatic. Log warnings and continue when possible.
- MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
- Output layout: write `<out>/<stem>/<stem>_001.md`, shared `<out>/<stem>/images/`, and `<out>/<stem>/<stem>_report.md`; new conversions do not persist public metadata JSON after Sprint 16.
- UI folder batch conversion: the UI may convert direct-child PDFs in a selected folder by sequentially invoking existing `pdf2md convert` commands.
- v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
## Architecture Guidance
Follow `ARCHITECTURE.md` for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file.
Key implementation constraints:
- Keep MinerU-specific objects behind the MinerU adapter.
- Keep public CLI/library contracts stable and project-owned.
- Keep Obsidian Markdown normalization separate from MinerU execution.
- Keep metadata and warning generation structured.
- Keep quality report generation derived from metadata and local checks.
- Do not add runtime engine selection in v1.
## Local-Only Requirements
Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in `ARCHITECTURE.md`.
Allowed in v1: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
Do not pass `--api-url`, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1.
## CLI Behavior
Follow the CLI requirements in `PRD.md`. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them.
Do not overwrite user files unless the requested behavior and `--overwrite` semantics allow it.
## Testing Guidance
Add tests in proportion to behavior risk.
Required early tests:
- Math delimiter normalization.
- Display math spacing.
- Asset path normalization.
- Metadata schema creation.
- Warning aggregation.
- CLI path planning and overwrite behavior.
- MinerU adapter contract with mocked outputs.
Fixture PDFs should cover:
- Simple digital PDF.
- Math-heavy academic PDF.
- Multi-column paper.
- Table with formulas.
- Figure with caption.
Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly.
## Git Workflow
After changing files:
- Run the smallest useful verification for the change.
- Check `git status --short`.
- Commit the completed change unless the user explicitly asks not to.
- Do not include unrelated user edits in the commit.
- Commit rollback requests - Verify the target commit and current status first, then use a direct non-interactive reset; leave untracked generated/local artifacts such as `build/`, `dist/`, `samples/`, and `*.spec` files untouched unless deletion is explicitly requested.
- Installed-runtime doctor debugging - Test both `uv run pdf2md doctor` and direct venv execution such as `.venv\Scripts\pdf2md.exe doctor`; direct execution may not inherit the same PATH behavior as `uv run`.
## Documentation Guidance
Keep documentation explicit about:
- Local-only privacy behavior.
- NVIDIA GPU expectations.
- MinerU installation and model downloads.
- Known limitations of automatic formula reconstruction.
- Dependency licenses.
- Obsidian output assumptions.
Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.