add pdftomd
This commit is contained in:
@@ -0,0 +1,229 @@
|
||||
# AGENTS.md
|
||||
|
||||
This file gives implementation instructions for coding agents working in this repository.
|
||||
|
||||
## Project Mission
|
||||
|
||||
Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets.
|
||||
|
||||
## Project Guidelines
|
||||
|
||||
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
|
||||
|
||||
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
|
||||
|
||||
### 1. Think Before Coding
|
||||
|
||||
**Don't assume. Don't hide confusion. Surface tradeoffs.**
|
||||
|
||||
Before implementing:
|
||||
- State your assumptions explicitly. If uncertain, ask.
|
||||
- If multiple interpretations exist, present them - don't pick silently.
|
||||
- If a simpler approach exists, say so. Push back when warranted.
|
||||
- If something is unclear, stop. Name what's confusing. Ask.
|
||||
|
||||
### 2. Simplicity First
|
||||
|
||||
**Minimum code that solves the problem. Nothing speculative.**
|
||||
|
||||
- No features beyond what was asked.
|
||||
- No abstractions for single-use code.
|
||||
- No "flexibility" or "configurability" that wasn't requested.
|
||||
- No error handling for impossible scenarios.
|
||||
- If you write 200 lines and it could be 50, rewrite it.
|
||||
|
||||
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
|
||||
|
||||
### 3. Surgical Changes
|
||||
|
||||
**Touch only what you must. Clean up only your own mess.**
|
||||
|
||||
When editing existing code:
|
||||
- Don't "improve" adjacent code, comments, or formatting.
|
||||
- Don't refactor things that aren't broken.
|
||||
- Match existing style, even if you'd do it differently.
|
||||
- If you notice unrelated dead code, mention it - don't delete it.
|
||||
|
||||
When your changes create orphans:
|
||||
- Remove imports/variables/functions that YOUR changes made unused.
|
||||
- Don't remove pre-existing dead code unless asked.
|
||||
|
||||
The test: Every changed line should trace directly to the user's request.
|
||||
|
||||
### 4. Goal-Driven Execution
|
||||
|
||||
**Define success criteria. Loop until verified.**
|
||||
|
||||
Transform tasks into verifiable goals:
|
||||
- "Add validation" -> "Write tests for invalid inputs, then make them pass"
|
||||
- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
|
||||
- "Refactor X" -> "Ensure tests pass before and after"
|
||||
|
||||
For multi-step tasks, state a brief plan:
|
||||
```
|
||||
1. [Step] -> verify: [check]
|
||||
2. [Step] -> verify: [check]
|
||||
3. [Step] -> verify: [check]
|
||||
```
|
||||
|
||||
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
|
||||
|
||||
---
|
||||
|
||||
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
|
||||
|
||||
## Source Documents
|
||||
|
||||
- `PLAN.md`: shared plan, planned work, open questions, and ownership for agents.
|
||||
- `PROGRESS.md`: completed work, current status, blockers, and next actions for agents.
|
||||
- `PRD.md`: product requirements, user scope, CLI/API requirements, acceptance criteria.
|
||||
- `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.
|
||||
- `docs/KNOWLEDGEBASE.md`: research basis and implementation background.
|
||||
- `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.
|
||||
- `docs/Sprints/*.md`: active and historical sprint contracts.
|
||||
- `.codex/agents/*.toml`: project-scoped custom subagent roles.
|
||||
- `.codex/commands/*.md`: reusable project prompt commands.
|
||||
- `.codex/skills/*/SKILL.md`: project-specific Codex skills.
|
||||
- `.codex/hooks.json` and `.codex/hooks/*.py`: project hook configuration and deterministic hook scripts.
|
||||
|
||||
## Startup Workflow
|
||||
|
||||
At the start of every task:
|
||||
|
||||
- Read `PLAN.md` and `PROGRESS.md` before deciding what to do.
|
||||
- Read only the other source documents needed for the task.
|
||||
- Use `.codex/agents`, `.codex/commands`, and `.codex/skills` when the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance.
|
||||
- State the relevant current goal, next action, and blocker if one exists.
|
||||
- If `PLAN.md` and `PROGRESS.md` conflict, trust `PROGRESS.md` for what has happened and update `PLAN.md` when making the next change.
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
Use `PLAN.md` and `PROGRESS.md` to coordinate work across agents.
|
||||
|
||||
- Update `PLAN.md` when planned work, ownership, sequencing, open questions, or decisions change.
|
||||
- Update `PROGRESS.md` after meaningful work, verification, blockers, or next actions change.
|
||||
- Keep entries short and factual.
|
||||
- Do not use these files as scratchpads or long research notes.
|
||||
- Do not mark work complete until it has been verified.
|
||||
- When multiple agents work in parallel, each agent must leave enough context in `PROGRESS.md` for the next agent to resume without guessing.
|
||||
|
||||
## Long-Running Harness Workflow
|
||||
|
||||
For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps.
|
||||
|
||||
Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred.
|
||||
|
||||
Harness roles:
|
||||
|
||||
- `harness-planner-agent`: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts.
|
||||
- `feature-generator-agent`: implements one agreed contract at a time after implementation has been explicitly requested.
|
||||
- `evaluation-agent`: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold.
|
||||
|
||||
Before each implementation chunk:
|
||||
|
||||
- Write or update a concise sprint contract before code changes start.
|
||||
- Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields.
|
||||
- Let `evaluation-agent` review the contract before `feature-generator-agent` implements it.
|
||||
- Avoid over-specifying low-level implementation before the responsible agent has inspected the code.
|
||||
|
||||
After each implementation chunk:
|
||||
|
||||
- `feature-generator-agent` runs a self-check but does not approve its own work.
|
||||
- `evaluation-agent` performs independent checks against the contract and reports actionable findings.
|
||||
- If the chunk fails, feed the evaluator's findings back into the next generator pass.
|
||||
- Update `PROGRESS.md` with completed work, checks run, residual risks, and the next concrete action.
|
||||
|
||||
When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action.
|
||||
|
||||
Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality.
|
||||
|
||||
## Fixed Product Decisions
|
||||
|
||||
- Language: Python.
|
||||
- Workflow: `uv`.
|
||||
- Interface: CLI plus Python library.
|
||||
- Default CLI name: `pdf2md`.
|
||||
- Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths.
|
||||
- Default output: Obsidian-friendly Markdown.
|
||||
- Inline math: `$...$`.
|
||||
- Display math: `$$...$$`.
|
||||
- Conversion engine: MinerU 3.1.0.
|
||||
- Hardware target: NVIDIA GPU.
|
||||
- Input priority: digital PDFs with text layers.
|
||||
- Quality workflow: fully automatic. Log warnings and continue when possible.
|
||||
- MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
|
||||
- Quality report: write both metadata JSON and `<stem>.report.md`.
|
||||
- v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
|
||||
|
||||
## Architecture Guidance
|
||||
|
||||
Follow `ARCHITECTURE.md` for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file.
|
||||
|
||||
Key implementation constraints:
|
||||
|
||||
- Keep MinerU-specific objects behind the MinerU adapter.
|
||||
- Keep public CLI/library contracts stable and project-owned.
|
||||
- Keep Obsidian Markdown normalization separate from MinerU execution.
|
||||
- Keep metadata and warning generation structured.
|
||||
- Keep quality report generation derived from metadata and local checks.
|
||||
- Do not add runtime engine selection in v1.
|
||||
|
||||
## Local-Only Requirements
|
||||
|
||||
Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in `ARCHITECTURE.md`.
|
||||
|
||||
Allowed in v1: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
|
||||
|
||||
Do not pass `--api-url`, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1.
|
||||
|
||||
## CLI Behavior
|
||||
|
||||
Follow the CLI requirements in `PRD.md`. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them.
|
||||
|
||||
Do not overwrite user files unless the requested behavior and `--overwrite` semantics allow it.
|
||||
|
||||
## Testing Guidance
|
||||
|
||||
Add tests in proportion to behavior risk.
|
||||
|
||||
Required early tests:
|
||||
|
||||
- Math delimiter normalization.
|
||||
- Display math spacing.
|
||||
- Asset path normalization.
|
||||
- Metadata schema creation.
|
||||
- Warning aggregation.
|
||||
- CLI path planning and overwrite behavior.
|
||||
- MinerU adapter contract with mocked outputs.
|
||||
|
||||
Fixture PDFs should cover:
|
||||
|
||||
- Simple digital PDF.
|
||||
- Math-heavy academic PDF.
|
||||
- Multi-column paper.
|
||||
- Table with formulas.
|
||||
- Figure with caption.
|
||||
|
||||
Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly.
|
||||
|
||||
## Git Workflow
|
||||
|
||||
After changing files:
|
||||
|
||||
- Run the smallest useful verification for the change.
|
||||
- Check `git status --short`.
|
||||
- Commit the completed change unless the user explicitly asks not to.
|
||||
- Do not include unrelated user edits in the commit.
|
||||
|
||||
## Documentation Guidance
|
||||
|
||||
Keep documentation explicit about:
|
||||
|
||||
- Local-only privacy behavior.
|
||||
- NVIDIA GPU expectations.
|
||||
- MinerU installation and model downloads.
|
||||
- Known limitations of automatic formula reconstruction.
|
||||
- Dependency licenses.
|
||||
- Obsidian output assumptions.
|
||||
|
||||
Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.
|
||||
Reference in New Issue
Block a user