add pdftomd

2026-05-08 16:42:19 +09:00
parent 551ab50735
commit 88d6b92283
99 changed files with 47332 additions and 0 deletions
@@ -0,0 +1,229 @@
+# AGENTS.md
+
+This file gives implementation instructions for coding agents working in this repository.
+
+## Project Mission
+
+Build a local-only PDF-to-Markdown converter for math-heavy digital PDFs. The converter must produce Obsidian-friendly Markdown and preserve enough metadata to debug formulas, reading order, tables, figures, and assets.
+
+## Project Guidelines
+
+Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
+
+**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
+
+### 1. Think Before Coding
+
+**Don't assume. Don't hide confusion. Surface tradeoffs.**
+
+Before implementing:
+- State your assumptions explicitly. If uncertain, ask.
+- If multiple interpretations exist, present them - don't pick silently.
+- If a simpler approach exists, say so. Push back when warranted.
+- If something is unclear, stop. Name what's confusing. Ask.
+
+### 2. Simplicity First
+
+**Minimum code that solves the problem. Nothing speculative.**
+
+- No features beyond what was asked.
+- No abstractions for single-use code.
+- No "flexibility" or "configurability" that wasn't requested.
+- No error handling for impossible scenarios.
+- If you write 200 lines and it could be 50, rewrite it.
+
+Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
+
+### 3. Surgical Changes
+
+**Touch only what you must. Clean up only your own mess.**
+
+When editing existing code:
+- Don't "improve" adjacent code, comments, or formatting.
+- Don't refactor things that aren't broken.
+- Match existing style, even if you'd do it differently.
+- If you notice unrelated dead code, mention it - don't delete it.
+
+When your changes create orphans:
+- Remove imports/variables/functions that YOUR changes made unused.
+- Don't remove pre-existing dead code unless asked.
+
+The test: Every changed line should trace directly to the user's request.
+
+### 4. Goal-Driven Execution
+
+**Define success criteria. Loop until verified.**
+
+Transform tasks into verifiable goals:
+- "Add validation" -> "Write tests for invalid inputs, then make them pass"
+- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
+- "Refactor X" -> "Ensure tests pass before and after"
+
+For multi-step tasks, state a brief plan:
+```
+1. [Step] -> verify: [check]
+2. [Step] -> verify: [check]
+3. [Step] -> verify: [check]
+```
+
+Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
+
+---
+
+**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
+
+## Source Documents
+
+- `PLAN.md`: shared plan, planned work, open questions, and ownership for agents.
+- `PROGRESS.md`: completed work, current status, blockers, and next actions for agents.
+- `PRD.md`: product requirements, user scope, CLI/API requirements, acceptance criteria.
+- `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.
+- `docs/KNOWLEDGEBASE.md`: research basis and implementation background.
+- `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.
+- `docs/Sprints/*.md`: active and historical sprint contracts.
+- `.codex/agents/*.toml`: project-scoped custom subagent roles.
+- `.codex/commands/*.md`: reusable project prompt commands.
+- `.codex/skills/*/SKILL.md`: project-specific Codex skills.
+- `.codex/hooks.json` and `.codex/hooks/*.py`: project hook configuration and deterministic hook scripts.
+
+## Startup Workflow
+
+At the start of every task:
+
+- Read `PLAN.md` and `PROGRESS.md` before deciding what to do.
+- Read only the other source documents needed for the task.
+- Use `.codex/agents`, `.codex/commands`, and `.codex/skills` when the user explicitly asks for agent delegation, reusable workflows, or specialized project guidance.
+- State the relevant current goal, next action, and blocker if one exists.
+- If `PLAN.md` and `PROGRESS.md` conflict, trust `PROGRESS.md` for what has happened and update `PLAN.md` when making the next change.
+
+## Progress Tracking
+
+Use `PLAN.md` and `PROGRESS.md` to coordinate work across agents.
+
+- Update `PLAN.md` when planned work, ownership, sequencing, open questions, or decisions change.
+- Update `PROGRESS.md` after meaningful work, verification, blockers, or next actions change.
+- Keep entries short and factual.
+- Do not use these files as scratchpads or long research notes.
+- Do not mark work complete until it has been verified.
+- When multiple agents work in parallel, each agent must leave enough context in `PROGRESS.md` for the next agent to resume without guessing.
+
+## Long-Running Harness Workflow
+
+For substantial implementation work, follow the planner/generator/evaluator pattern from Anthropic's long-running harness design article: https://www.anthropic.com/engineering/harness-design-long-running-apps.
+
+Use the harness only when task complexity justifies the overhead. For small documentation edits or narrow fixes, a single agent with focused verification is preferred.
+
+Harness roles:
+
+- `harness-planner-agent`: expands a brief request into product context, high-level technical direction, non-goals, risks, and a sequence of small contracts.
+- `feature-generator-agent`: implements one agreed contract at a time after implementation has been explicitly requested.
+- `evaluation-agent`: independently reviews proposed contracts and completed work. It must be skeptical, specific, and willing to fail work that is incomplete, stubbed, unverified, or below threshold.
+
+Before each implementation chunk:
+
+- Write or update a concise sprint contract before code changes start.
+- Include objective, touched surfaces, expected outputs, non-goals, verification steps, hard failure criteria, and handoff fields.
+- Let `evaluation-agent` review the contract before `feature-generator-agent` implements it.
+- Avoid over-specifying low-level implementation before the responsible agent has inspected the code.
+
+After each implementation chunk:
+
+- `feature-generator-agent` runs a self-check but does not approve its own work.
+- `evaluation-agent` performs independent checks against the contract and reports actionable findings.
+- If the chunk fails, feed the evaluator's findings back into the next generator pass.
+- Update `PROGRESS.md` with completed work, checks run, residual risks, and the next concrete action.
+
+When context becomes too large or a task spans sessions, prefer a clean structured handoff over relying only on conversation history. The handoff must include current state, decisions made, files touched, checks run, known failures, and the next action.
+
+Periodically re-evaluate the harness itself. Remove roles, contracts, or checks that are not load-bearing, and add structure only when it improves correctness, scope control, or verification quality.
+
+## Fixed Product Decisions
+
+- Language: Python.
+- Workflow: `uv`.
+- Interface: CLI plus Python library.
+- Default CLI name: `pdf2md`.
+- Runtime policy: local-only. Do not add cloud OCR, remote LLM, or external document upload paths.
+- Default output: Obsidian-friendly Markdown.
+- Inline math: `$...$`.
+- Display math: `$$...$$`.
+- Conversion engine: MinerU 3.1.0.
+- Hardware target: NVIDIA GPU.
+- Input priority: digital PDFs with text layers.
+- Quality workflow: fully automatic. Log warnings and continue when possible.
+- MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
+- Quality report: write both metadata JSON and `<stem>.report.md`.
+- v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
+
+## Architecture Guidance
+
+Follow `ARCHITECTURE.md` for implementation structure. Do not duplicate architecture decisions in code comments or docs unless the new text points back to that file.
+
+Key implementation constraints:
+
+- Keep MinerU-specific objects behind the MinerU adapter.
+- Keep public CLI/library contracts stable and project-owned.
+- Keep Obsidian Markdown normalization separate from MinerU execution.
+- Keep metadata and warning generation structured.
+- Keep quality report generation derived from metadata and local checks.
+- Do not add runtime engine selection in v1.
+
+## Local-Only Requirements
+
+Never add runtime dependencies that upload PDFs, page images, or extracted text to remote services. Follow the strict-local enforcement rules in `ARCHITECTURE.md`.
+
+Allowed in v1: direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
+
+Do not pass `--api-url`, use remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible inference endpoints in v1.
+
+## CLI Behavior
+
+Follow the CLI requirements in `PRD.md`. Do not add commands, flags, config files, or runtime engine selection unless the user explicitly asks for them.
+
+Do not overwrite user files unless the requested behavior and `--overwrite` semantics allow it.
+
+## Testing Guidance
+
+Add tests in proportion to behavior risk.
+
+Required early tests:
+
+- Math delimiter normalization.
+- Display math spacing.
+- Asset path normalization.
+- Metadata schema creation.
+- Warning aggregation.
+- CLI path planning and overwrite behavior.
+- MinerU adapter contract with mocked outputs.
+
+Fixture PDFs should cover:
+
+- Simple digital PDF.
+- Math-heavy academic PDF.
+- Multi-column paper.
+- Table with formulas.
+- Figure with caption.
+
+Any test that depends on large local models should be optional or marked separately so normal CI/dev checks can run quickly.
+
+## Git Workflow
+
+After changing files:
+
+- Run the smallest useful verification for the change.
+- Check `git status --short`.
+- Commit the completed change unless the user explicitly asks not to.
+- Do not include unrelated user edits in the commit.
+
+## Documentation Guidance
+
+Keep documentation explicit about:
+
+- Local-only privacy behavior.
+- NVIDIA GPU expectations.
+- MinerU installation and model downloads.
+- Known limitations of automatic formula reconstruction.
+- Dependency licenses.
+- Obsidian output assumptions.
+
+Do not imply perfect LaTeX conversion. The correct guarantee is best-effort automatic conversion with warnings and provenance.