249 lines
12 KiB
Markdown
249 lines
12 KiB
Markdown
# AGENTS.md
|
|
|
|
This file gives implementation instructions for coding agents working in this repository.
|
|
|
|
## Project Mission
|
|
|
|
Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets.
|
|
|
|
## Project Guidelines
|
|
|
|
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
|
|
|
|
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
|
|
|
|
### 1. Think Before Coding
|
|
|
|
**Don't assume. Don't hide confusion. Surface tradeoffs.**
|
|
|
|
Before implementing:
|
|
- State your assumptions explicitly. If uncertain, ask.
|
|
- If multiple interpretations exist, present them - don't pick silently.
|
|
- If a simpler approach exists, say so. Push back when warranted.
|
|
- If something is unclear, stop. Name what's confusing. Ask.
|
|
|
|
### 2. Simplicity First
|
|
|
|
**Minimum code that solves the problem. Nothing speculative.**
|
|
|
|
- No features beyond what was asked.
|
|
- No abstractions for single-use code.
|
|
- No "flexibility" or "configurability" that wasn't requested.
|
|
- No error handling for impossible scenarios.
|
|
- If you write 200 lines and it could be 50, rewrite it.
|
|
|
|
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
|
|
|
|
### 3. Surgical Changes
|
|
|
|
**Touch only what you must. Clean up only your own mess.**
|
|
|
|
When editing existing code:
|
|
- Don't "improve" adjacent code, comments, or formatting.
|
|
- Don't refactor things that aren't broken.
|
|
- Match existing style, even if you'd do it differently.
|
|
- If you notice unrelated dead code, mention it - don't delete it.
|
|
|
|
When your changes create orphans:
|
|
- Remove imports/variables/functions that YOUR changes made unused.
|
|
- Don't remove pre-existing dead code unless asked.
|
|
|
|
The test: Every changed line should trace directly to the user's request.
|
|
|
|
### 4. Goal-Driven Execution
|
|
|
|
**Define success criteria. Loop until verified.**
|
|
|
|
Transform tasks into verifiable goals:
|
|
- "Add validation" -> "Write tests for invalid inputs, then make them pass"
|
|
- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
|
|
- "Refactor X" -> "Ensure tests pass before and after"
|
|
|
|
For multi-step tasks, state a brief plan:
|
|
```
|
|
1. [Step] -> verify: [check]
|
|
2. [Step] -> verify: [check]
|
|
3. [Step] -> verify: [check]
|
|
```
|
|
|
|
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
|
|
|
|
---
|
|
|
|
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
|
|
|
|
## Commands
|
|
|
|
| Command | Description |
|
|
| --- | --- |
|
|
| `uv run pytest` | Run the default fast test suite. |
|
|
| `uv run pdf2md doctor` | Check local Python, uv, MinerU, GPU/PyTorch, model/cache, MathJax, and strict-local setup. |
|
|
| `uv run pytest tests/test_ui_runner.py` | Run focused UI command-resolution and subprocess tests. |
|
|
| `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py` | Rebuild the thin Windows UI executable. |
|
|
| `uv run pdf2md convert paper.pdf --out outputs --chunk-pages --gpu auto --mineru-profile auto --strict-local` | Optional local conversion smoke; keep generated output ignored. |
|
|
|
|
## Source Documents
|
|
|
|
- `PLAN.md`: shared plan, planned work, open questions, and ownership for agents.
|
|
- `PROGRESS.md`: completed work, current status, blockers, and next actions for agents.
|
|
- `PRD.md`: product requirements, user scope, CLI/API requirements, acceptance criteria.
|
|
- `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.
|
|
- `docs/KNOWLEDGEBASE.md`: research basis and implementation background.
|
|
- `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.
|
|
- `docs/UI_RESEARCH.md`: research basis for the implemented minimal Windows UI launcher.
|
|
- `docs/WORKARCHIVE.md`: archived completed work, historical sprint outcomes, setup results, verification history, and sample conversion evidence.
|
|
- `docs/Sprints/*.md`: active and historical sprint contracts.
|
|
- `docs/superpowers/specs/*.md`: design specs created for focused project workflows.
|
|
- `docs/superpowers/plans/*.md`: executable task plans created from specs, including completed UI folder batch work and abandoned historical plans.
|
|
- `.codex/agents/*.toml`: project-scoped custom subagent roles.
|
|
- `.codex/commands/*.md`: reusable project prompt commands.
|
|
- `.codex/skills/*/SKILL.md`: project-specific Codex skills.
|
|
- `.codex/hooks.json` and `.codex/hooks/*.py`: project hook configuration and deterministic hook scripts.
|
|
|
|
## Startup Workflow
|
|
|
|
At the start of every task:
|
|
|
|
- Read `PLAN.md` and `PROGRESS.md` before deciding what to do.
|
|
- Read only the other source documents needed for the task.
|
|
- Read `docs/WORKARCHIVE.md` when the task needs historical completed-work context, previous verification results, or sample conversion evidence.
|
|
- Use `.codex/agents`, `.codex/commands`, and `.codex/skills` when the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance.
|
|
- State the relevant current goal, next action, and blocker if one exists.
|
|
- If `PLAN.md` and `PROGRESS.md` conflict, trust `PROGRESS.md` for what has happened and update `PLAN.md` when making the next change.
|
|
|
|
## Progress Tracking
|
|
|
|
Use `PLAN.md` and `PROGRESS.md` to coordinate work across agents.
|
|
|
|
- Update `PLAN.md` when planned work, ownership, sequencing, open questions, or decisions change.
|
|
- Update `PROGRESS.md` after meaningful work, verification, blockers, or next actions change.
|
|
- Move completed work from `PROGRESS.md` to `docs/WORKARCHIVE.md` when it is no longer needed for current coordination.
|
|
- Keep entries short and factual.
|
|
- Do not use these files as scratchpads or long research notes.
|
|
- Do not mark work complete until it has been verified.
|
|
- When multiple agents work in parallel, each agent must leave enough context in `PROGRESS.md` for the next agent to resume without guessing.
|
|
|
|
## Long-Running Harness Workflow
|
|
|
|
For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps.
|
|
|
|
Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred.
|
|
|
|
Harness roles:
|
|
|
|
- `harness-planner-agent`: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts.
|
|
- `feature-generator-agent`: implements one agreed contract at a time after implementation has been explicitly requested.
|
|
- `evaluation-agent`: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold.
|
|
|
|
Before each implementation chunk:
|
|
|
|
- Write or update a concise sprint contract before code changes start.
|
|
- Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields.
|
|
- Let `evaluation-agent` review the contract before `feature-generator-agent` implements it.
|
|
- Avoid over-specifying low-level implementation before the responsible agent has inspected the code.
|
|
|
|
After each implementation chunk:
|
|
|
|
- `feature-generator-agent` runs a self-check but does not approve its own work.
|
|
- `evaluation-agent` performs independent checks against the contract and reports actionable findings.
|
|
- If the chunk fails, feed the evaluator's findings back into the next generator pass.
|
|
- Update `PROGRESS.md` with completed work, checks run, residual risks, and the next concrete action.
|
|
|
|
When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action.
|
|
|
|
Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality.
|
|
|
|
## Fixed Product Decisions
|
|
|
|
- Language: Python.
|
|
- Workflow: `uv`.
|
|
- Interface: CLI plus Python library.
|
|
- Default CLI name: `pdf2md`.
|
|
- Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths.
|
|
- Default output: Obsidian-friendly Markdown.
|
|
- Inline math: `$...$`.
|
|
- Display math: `$$...$$`.
|
|
- Conversion engine: MinerU 3.1.0.
|
|
- Hardware target: NVIDIA GPU.
|
|
- Input priority: digital PDFs with text layers.
|
|
- Quality workflow: fully automatic. Log warnings and continue when possible.
|
|
- MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
|
|
- Output layout: write `<out>/<stem>/<stem>_001.md`, shared `<out>/<stem>/images/`, and `<out>/<stem>/<stem>_report.md`; new conversions do not persist public metadata JSON after Sprint 16.
|
|
- UI folder batch conversion: the UI may convert direct-child PDFs in a selected folder by sequentially invoking existing `pdf2md convert` commands.
|
|
- v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
|
|
|
|
## Architecture Guidance
|
|
|
|
Follow `ARCHITECTURE.md` for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file.
|
|
|
|
Key implementation constraints:
|
|
|
|
- Keep MinerU-specific objects behind the MinerU adapter.
|
|
- Keep public CLI/library contracts stable and project-owned.
|
|
- Keep Obsidian Markdown normalization separate from MinerU execution.
|
|
- Keep metadata and warning generation structured.
|
|
- Keep quality report generation derived from metadata and local checks.
|
|
- Do not add runtime engine selection in v1.
|
|
|
|
## Local-Only Requirements
|
|
|
|
Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in `ARCHITECTURE.md`.
|
|
|
|
Allowed in v1: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
|
|
|
|
Do not pass `--api-url`, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1.
|
|
|
|
## CLI Behavior
|
|
|
|
Follow the CLI requirements in `PRD.md`. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them.
|
|
|
|
Do not overwrite user files unless the requested behavior and `--overwrite` semantics allow it.
|
|
|
|
## Testing Guidance
|
|
|
|
Add tests in proportion to behavior risk.
|
|
|
|
Required early tests:
|
|
|
|
- Math delimiter normalization.
|
|
- Display math spacing.
|
|
- Asset path normalization.
|
|
- Metadata schema creation.
|
|
- Warning aggregation.
|
|
- CLI path planning and overwrite behavior.
|
|
- MinerU adapter contract with mocked outputs.
|
|
|
|
Fixture PDFs should cover:
|
|
|
|
- Simple digital PDF.
|
|
- Math-heavy academic PDF.
|
|
- Multi-column paper.
|
|
- Table with formulas.
|
|
- Figure with caption.
|
|
|
|
Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly.
|
|
|
|
## Git Workflow
|
|
|
|
After changing files:
|
|
|
|
- Run the smallest useful verification for the change.
|
|
- Check `git status --short`.
|
|
- Commit the completed change unless the user explicitly asks not to.
|
|
- Do not include unrelated user edits in the commit.
|
|
- Commit rollback requests - Verify the target commit and current status first, then use a direct non-interactive reset; leave untracked generated/local artifacts such as `build/`, `dist/`, `samples/`, and `*.spec` files untouched unless deletion is explicitly requested.
|
|
- Installed-runtime doctor debugging - Test both `uv run pdf2md doctor` and direct venv execution such as `.venv\Scripts\pdf2md.exe doctor`; direct execution may not inherit the same PATH behavior as `uv run`.
|
|
|
|
## Documentation Guidance
|
|
|
|
Keep documentation explicit about:
|
|
|
|
- Local-only privacy behavior.
|
|
- NVIDIA GPU expectations.
|
|
- MinerU installation and model downloads.
|
|
- Known limitations of automatic formula reconstruction.
|
|
- Dependency licenses.
|
|
- Obsidian output assumptions.
|
|
|
|
Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.
|