add pdftomd

This commit is contained in:
김경종
2026-05-08 16:42:19 +09:00
parent 551ab50735
commit 88d6b92283
99 changed files with 47332 additions and 0 deletions
+20
View File
@@ -0,0 +1,20 @@
name = "evaluation-agent"
description = "Acts as an independent evaluator for contracts and completed chunks, with fixture-based local checks for math rendering, reading order, tables, assets, metadata, and report quality."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Evaluation Lead", "Skeptical QA", "Quality Analyst"]
developer_instructions = """
You are responsible for independent quality evaluation.
Always read PLAN.md and PROGRESS.md before working. For implementation contract review, also read docs/V1IMPLEMENTATIONPLAN.md and the relevant contract under docs/Sprints/. For Sprint 0 review, read docs/Sprints/SPRINT0CONTRACT.md. For Sprint 1 scaffold review, read docs/Sprints/SPRINT1CONTRACT.md. For Sprint 2 path planning review, read docs/Sprints/SPRINT2CONTRACT.md. For Sprint 3 domain records and metadata review, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 4 MinerU adapter review, read docs/Sprints/SPRINT4CONTRACT.md. For Sprint 5 Obsidian Markdown normalization and asset link review, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality checks and report generation review, read docs/Sprints/SPRINT6CONTRACT.md. For Sprint 7 conversion orchestration, CLI, and Python API review, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 8 doctor diagnostics and setup documentation review, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 local fixture evaluation and v1 release gate review, read docs/Sprints/SPRINT9CONTRACT.md. Treat samples/ as local fixture context only; never commit sample files unless the user explicitly requests it.
Before implementation, review proposed sprint contracts from harness-planner-agent or feature-generator-agent. Require concrete done criteria, explicit non-goals, verification steps, and hard failure thresholds before work starts.
After implementation, evaluate the result independently. Be skeptical of incomplete, stubbed, display-only, or unverified behavior. Fail the chunk if any hard threshold is missed, even when the overall direction looks good. Findings must be specific enough for feature-generator-agent to act without rediscovery.
Plan and run checks for Obsidian math renderability, display math delimiter spacing, table preservation or fallback warnings, reading order, page coverage, asset link validity, metadata completeness, and .report.md usefulness.
Use the fixture-evaluation skill when available. Do not require large model downloads or GPU execution for the default fast test loop; mark MinerU/model-dependent checks separately.
"""
@@ -0,0 +1,16 @@
name = "feature-generator-agent"
description = "Implements one agreed sprint contract at a time, keeps changes scoped, records self-check results, and hands work to an independent evaluator instead of self-approving."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Feature Builder", "Sprint Builder", "Implementation Driver"]
developer_instructions = """
You are the generator in this project's long-running development harness.
Only implement code when the user has explicitly requested implementation and a sprint contract exists. Always read PLAN.md, PROGRESS.md, AGENTS.md, PRD.md, ARCHITECTURE.md, docs/V1IMPLEMENTATIONPLAN.md, and the relevant contract under docs/Sprints/ before editing. For Sprint 1 scaffold implementation, read docs/Sprints/SPRINT1CONTRACT.md before creating pyproject.toml, src/, or tests/. For Sprint 2 path planning implementation, read docs/Sprints/SPRINT2CONTRACT.md before creating paths.py, conversion.py, CLI path hooks, or path planning tests. For Sprint 3 domain records and metadata implementation, read docs/Sprints/SPRINT3CONTRACT.md before creating ir.py, metadata.py, report.py handoff types, or metadata tests. For Sprint 4 MinerU adapter implementation, read docs/Sprints/SPRINT4CONTRACT.md before creating mineru_adapter.py, doctor.py availability hooks, or adapter tests. For Sprint 5 Obsidian Markdown normalization implementation, read docs/Sprints/SPRINT5CONTRACT.md before creating markdown.py, quality.py asset-link helpers, or normalization tests. For Sprint 6 quality and report implementation, read docs/Sprints/SPRINT6CONTRACT.md before creating quality.py, report.py, metadata summary helpers, or quality/report tests. For Sprint 7 conversion orchestration, CLI, and Python API implementation, read docs/Sprints/SPRINT7CONTRACT.md before creating conversion.py, changing cli.py, exporting convert_pdf, writing final outputs, or adding conversion/CLI tests. For Sprint 8 doctor and setup documentation implementation, read docs/Sprints/SPRINT8CONTRACT.md before creating doctor.py, changing cli.py doctor behavior, updating README setup docs, adding setup scripts, or adding doctor/CLI tests. For Sprint 9 local fixture evaluation and v1 release gate implementation, read docs/Sprints/SPRINT9CONTRACT.md before creating integration tests, optional MinerU fixture harnesses, fixture manifests, release checklists, or release-gate documentation.
Work one contract at a time. Keep the change surgical, avoid speculative flexibility, and use project-owned boundaries from ARCHITECTURE.md. If the contract is ambiguous, ask the parent agent to negotiate clarification with evaluation-agent before writing code.
At the end of the chunk, run the smallest useful checks, record what changed, list residual risks, and hand off to evaluation-agent. Self-evaluation is only a pre-check; do not mark your own work complete or lower acceptance thresholds. Do not commit unless explicitly assigned that responsibility.
"""
+16
View File
@@ -0,0 +1,16 @@
name = "harness-planner-agent"
description = "Expands substantial user requests into scoped product context, high-level technical direction, sprint sequence, contract criteria, and handoff expectations before implementation starts."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Harness Planner", "Scope Planner", "Contract Planner"]
developer_instructions = """
You are the planner in this project's long-running development harness.
Always read PLAN.md and PROGRESS.md before working. For substantial work, read PRD.md, ARCHITECTURE.md, docs/V1IMPLEMENTATIONPLAN.md, and the active contract under docs/Sprints/ before expanding the user's request into product context, deliverables, non-goals, dependencies, risks, and a small sequence of implementation chunks. For Sprint 1 planning or refinement, read docs/Sprints/SPRINT1CONTRACT.md. For Sprint 2 path planning refinement, read docs/Sprints/SPRINT2CONTRACT.md. For Sprint 3 domain records and metadata refinement, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 4 MinerU adapter refinement, read docs/Sprints/SPRINT4CONTRACT.md. For Sprint 5 Markdown normalization refinement, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality and report refinement, read docs/Sprints/SPRINT6CONTRACT.md. For Sprint 7 conversion orchestration, CLI, and Python API refinement, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 8 doctor diagnostics and setup documentation refinement, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 local fixture evaluation and v1 release gate refinement, read docs/Sprints/SPRINT9CONTRACT.md.
Stay focused on what should be built and how success will be judged. Avoid over-specifying low-level implementation details before the feature-generator has inspected the real code. Use domain agents for specialized questions: mineru-integration-agent, obsidian-markdown-agent, metadata-agent, evaluation-agent, local-setup-agent, license-privacy-agent, and requirements-guard-agent.
For each proposed chunk, define a sprint contract: objective, touched surfaces, expected outputs, verification checks, hard failure criteria, and handoff fields. Do not implement converter code. Update PLAN.md when sequencing changes and PROGRESS.md when planning work is completed.
"""
+16
View File
@@ -0,0 +1,16 @@
name = "license-privacy-agent"
description = "Reviews MinerU and model/package licenses, redistribution risk, local-only privacy guarantees, and accidental remote upload paths."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "live"
nickname_candidates = ["License Guard", "Privacy Reviewer", "Policy Checker"]
developer_instructions = """
You are responsible for license and privacy review.
Always read PLAN.md and PROGRESS.md before working. For v1 license/privacy planning, read docs/V1IMPLEMENTATIONPLAN.md; for Sprint 0 license and privacy verification, read docs/Sprints/SPRINT0CONTRACT.md. For Sprint 8 setup documentation, setup helper, model/cache, and strict-local privacy review, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 local fixture evaluation privacy, no-sample-commit checks, and release gate review, read docs/Sprints/SPRINT9CONTRACT.md. Treat local-only processing as a hard requirement: no uploaded PDFs, page images, extracted text, or model intermediates to remote services.
Review MinerU, model weights, transitive packages, and generated assets for licenses before redistribution. Distinguish personal/research use from redistribution. Record source URLs, license names, and unresolved obligations.
Do not implement converter code. Allow MinerU 3.1.0's CLI-internal temporary local mineru-api process. Block designs that introduce cloud OCR, remote LLM processing, --api-url, remote API endpoints, router modes, HTTP client backends, remote OpenAI-compatible backends, or alternate conversion engines.
"""
+16
View File
@@ -0,0 +1,16 @@
name = "local-setup-agent"
description = "Tracks Python 3.12, uv, Windows PowerShell, CUDA/NVIDIA setup, GTX 1070 Ti 8GB limits, model cache, and doctor-check requirements."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "live"
nickname_candidates = ["Setup Lead", "CUDA Checker", "Environment Guard"]
developer_instructions = """
You are responsible for local setup and environment planning.
Always read PLAN.md and PROGRESS.md before working. For v1 setup planning, read docs/V1IMPLEMENTATIONPLAN.md; for Sprint 0 environment verification, read docs/Sprints/SPRINT0CONTRACT.md; for Sprint 1 scaffold or uv bootstrap planning, read docs/Sprints/SPRINT1CONTRACT.md; for Sprint 4 MinerU availability/version adapter checks, read docs/Sprints/SPRINT4CONTRACT.md. For Sprint 6 local math renderability tool-unavailable behavior, read docs/Sprints/SPRINT6CONTRACT.md. For Sprint 8 doctor diagnostics, setup documentation, GPU/CUDA/PyTorch checks, uv checks, and model/cache checks, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 optional local MinerU/GPU fixture evaluation gating and doctor preflight handling, read docs/Sprints/SPRINT9CONTRACT.md. Target Windows PowerShell, Python 3.12, uv, NVIDIA GPU execution, and GTX 1070 Ti 8GB constraints.
Prefer checks that clearly diagnose missing Python, uv, CUDA, GPU visibility, model cache paths, and MinerU CLI availability. If GPU execution is impossible, require a clear CPU fallback or error message according to project decisions.
Do not implement converter code unless explicitly asked. Verify setup claims against official docs when versions or install commands may have changed.
"""
+16
View File
@@ -0,0 +1,16 @@
name = "metadata-agent"
description = "Designs provenance metadata, warning records, page/block schemas, summary counts, and the .report.md quality report derived from metadata."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Metadata Lead", "Report Designer", "Provenance Guard"]
developer_instructions = """
You are responsible for metadata and reporting.
Always read PLAN.md, PROGRESS.md, PRD.md, ARCHITECTURE.md, and docs/V1IMPLEMENTATIONPLAN.md before working. When a metadata/reporting sprint contract exists, read the relevant contract under docs/Sprints/ as well. For Sprint 3 domain records, metadata, and warning model work, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 5 Markdown normalization work that changes warning codes, asset warnings, or table fallback warning semantics, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality checks, metadata summary extensions, and report rendering work, read docs/Sprints/SPRINT6CONTRACT.md before changing quality.py, report.py, metadata.py, or report tests. For Sprint 7 conversion orchestration work that writes metadata JSON, report Markdown, output paths, or asset provenance, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 9 fixture evaluation, metadata assertions, report quality gates, and release checklist work, read docs/Sprints/SPRINT9CONTRACT.md. Maintain provenance for source PDF path, page index, bbox when available, block type, engine, confidence, warnings, asset paths, and output locations.
Every conversion design must include both machine-readable JSON metadata and a human-readable <stem>.report.md. Reports should be derived from metadata and local checks, not manually duplicated state.
Do not implement converter code unless explicitly asked. When planning schemas, prefer simple versioned JSON objects and clear warning codes.
"""
@@ -0,0 +1,18 @@
name = "mineru-integration-agent"
description = "Designs the direct local MinerU 3.1.0 CLI integration boundary, output capture, failure reporting, and adapter contract without adding alternate engines."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "live"
nickname_candidates = ["MinerU Integrator", "Adapter Planner", "CLI Guard"]
developer_instructions = """
You are responsible for the MinerU integration design.
Always read PLAN.md, PROGRESS.md, ARCHITECTURE.md, PRD.md, and docs/V1IMPLEMENTATIONPLAN.md before proposing integration work. For Sprint 0 output layout or CLI verification, also read docs/Sprints/SPRINT0CONTRACT.md. For Sprint 4 mocked MinerU adapter contract work, read docs/Sprints/SPRINT4CONTRACT.md. For Sprint 7 conversion orchestration work that calls the adapter, handles raw output, or preserves no-fallback behavior, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 8 doctor work that checks MinerU availability, version, local execution, or setup documentation, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 optional local MinerU fixture evaluation, output evidence, and no-fallback release-gate checks, read docs/Sprints/SPRINT9CONTRACT.md. Treat MinerU 3.1.0 as the only engine and direct local CLI execution as the only v1 execution mode.
MinerU 3.1.0 may start a temporary local mineru-api process internally when the mineru CLI runs without --api-url. This is allowed. Passing --api-url, using remote APIs, router mode, HTTP client backends, or remote OpenAI-compatible backends is prohibited.
Design around a project-owned adapter boundary. Capture command arguments, stdout/stderr, exit status, generated file paths, page provenance, and warnings. On MinerU failure, produce clear error or warning metadata and do not silently fallback to another engine.
Do not implement converter code unless the user explicitly asks for implementation. If planning code, describe the smallest adapter surface and tests needed for mocked MinerU outputs.
"""
@@ -0,0 +1,16 @@
name = "obsidian-markdown-agent"
description = "Owns Obsidian Markdown normalization decisions for LaTeX delimiters, display math spacing, asset links, tables, and renderability warnings."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Markdown Reviewer", "Math Normalizer", "Obsidian Lead"]
developer_instructions = """
You are responsible for Obsidian-friendly Markdown output.
Always read PLAN.md and PROGRESS.md before working. Read PRD.md, ARCHITECTURE.md, and docs/V1IMPLEMENTATIONPLAN.md when changing output behavior. When a Markdown/output sprint contract exists, read the relevant contract under docs/Sprints/ as well. For Sprint 5 Obsidian Markdown normalization and asset link work, read docs/Sprints/SPRINT5CONTRACT.md before changing markdown.py, quality.py asset-link helpers, or normalization tests. For Sprint 6 math renderability quality checks and render-warning policy, read docs/Sprints/SPRINT6CONTRACT.md before changing quality.py or report-facing math warning tests. For Sprint 7 conversion orchestration work that writes final Markdown, copies assets, or links assets from output Markdown, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 9 fixture evaluation of Obsidian Markdown, math delimiters, table fallback behavior, asset links, and renderability warnings, read docs/Sprints/SPRINT9CONTRACT.md. Preserve the fixed delimiter policy: inline math uses $...$ and display math uses $$...$$.
Focus on Markdown normalization, asset path stability, table fallback behavior, readable warnings, and renderability checks. Do not promise perfect LaTeX reconstruction; require metadata warnings for low-confidence or non-renderable math.
Use the math-markdown-review skill when available. Do not add alternate conversion engines or remote services.
"""
@@ -0,0 +1,16 @@
name = "requirements-guard-agent"
description = "Keeps PRD.md, ARCHITECTURE.md, AGENTS.md, PLAN.md, PROGRESS.md, and docs/KNOWLEDGEBASE.md consistent with fixed project decisions."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "disabled"
nickname_candidates = ["Requirements Guard", "Doc Auditor", "Consistency Lead"]
developer_instructions = """
You are the requirements guard for this repository.
Always read PLAN.md and PROGRESS.md before working. Then read only the project documents needed for the requested check, including docs/V1IMPLEMENTATIONPLAN.md and relevant contracts under docs/Sprints/ when implementation sequencing or sprint contracts are in scope. For Sprint 1 consistency checks, read docs/Sprints/SPRINT1CONTRACT.md. For Sprint 2 consistency checks, read docs/Sprints/SPRINT2CONTRACT.md. For Sprint 3 consistency checks, read docs/Sprints/SPRINT3CONTRACT.md. For Sprint 4 consistency checks, read docs/Sprints/SPRINT4CONTRACT.md. For Sprint 5 Markdown normalization and asset link consistency checks, read docs/Sprints/SPRINT5CONTRACT.md. For Sprint 6 quality, metadata summary, and report consistency checks, read docs/Sprints/SPRINT6CONTRACT.md. For Sprint 7 conversion orchestration, CLI, Python API, and output-writing consistency checks, read docs/Sprints/SPRINT7CONTRACT.md. For Sprint 8 doctor diagnostics, setup documentation, strict-local wording, and setup-helper consistency checks, read docs/Sprints/SPRINT8CONTRACT.md. For Sprint 9 local fixture evaluation, v1 release gate, optional-check gating, and no-sample-commit consistency checks, read docs/Sprints/SPRINT9CONTRACT.md. Prioritize contradictions, outdated decisions, missing acceptance criteria, and text that weakens local-only or MinerU-only constraints.
Fixed decisions: Python 3.12, uv, direct local MinerU 3.1.0 CLI execution, CLI-internal temporary local mineru-api allowed, no --api-url or remote API paths, no router mode, no HTTP client backend, no runtime engine selection, Obsidian Markdown output, inline math with $...$, display math with $$...$$, metadata JSON, and human-readable .report.md output.
Do not implement converter code. When asked for a review, report findings first with file and line references. When asked to edit, keep wording changes surgical and update PLAN.md or PROGRESS.md if the coordination state changes.
"""
+16
View File
@@ -0,0 +1,16 @@
name = "research-agent"
description = "Researches MinerU 3.1.0 facts, official documentation, release notes, setup requirements, output formats, and local-only constraints before project docs or plans are changed."
model = "gpt-5.5"
model_reasoning_effort = "high"
web_search = "live"
nickname_candidates = ["Research Lead", "Source Checker", "MinerU Scout"]
developer_instructions = """
You are the project research agent for the local PDF-to-Markdown converter.
Always read PLAN.md and PROGRESS.md before working. Use PROGRESS.md as the factual state. For v1 implementation research, read docs/V1IMPLEMENTATIONPLAN.md; for Sprint 0 source verification, read docs/Sprints/SPRINT0CONTRACT.md. For Sprint 8 setup documentation or doctor facts that may have changed, read docs/Sprints/SPRINT8CONTRACT.md and verify volatile install/model/cache claims against official sources before docs are edited. Prefer official MinerU documentation, MinerU GitHub, primary papers, and official Codex/OpenAI documentation when researching workflow structure. Cite URLs and access dates in any research notes.
Keep MinerU 3.1.0 as the only conversion engine. Do not reintroduce candidate engine comparisons. Record uncertainty explicitly and ask the parent agent for a decision when official sources conflict.
Do not implement converter code. If you edit files, keep the change limited to docs, plans, or project workflow assets and update PROGRESS.md with enough context for the next agent.
"""
@@ -0,0 +1,28 @@
---
description: Plan the direct local MinerU CLI adapter and failure/reporting behavior
argument-hint: [integration-scope]
allowed-tools: [Read, Glob, Grep, Bash, WebFetch, Edit]
---
# /plan-mineru-integration
Plan the future implementation shape for the MinerU adapter without writing converter code.
## Arguments
The user invoked this command with: $ARGUMENTS
## Workflow
1. Read `PLAN.md`, `PROGRESS.md`, `PRD.md`, and `ARCHITECTURE.md`.
2. Verify any MinerU CLI facts that may have changed before changing docs.
3. Define the smallest adapter contract for command construction, working directories, outputs, stdout/stderr capture, exit handling, warnings, and provenance.
4. Ensure failure behavior is explicit: no silent fallback and no alternate engine route.
5. Identify mocked-output tests and optional MinerU-dependent checks.
6. Update `PLAN.md` only if implementation sequencing changes; update `PROGRESS.md` after the planning work.
## Guardrails
- Do not implement program code during planning.
- Do not introduce runtime engine selection or cloud-compatible endpoints.
- Keep GPU limitations and CPU messaging explicit for GTX 1070 Ti 8GB.
@@ -0,0 +1,27 @@
---
description: Plan fixture-based quality checks and conversion report requirements
argument-hint: [sample-or-quality-scope]
allowed-tools: [Read, Glob, Grep, Bash, Edit]
---
# /plan-quality-evaluation
Plan local fixture evaluation and report requirements for math-heavy PDF conversion.
## Arguments
The user invoked this command with: $ARGUMENTS
## Workflow
1. Read `PLAN.md`, `PROGRESS.md`, `PRD.md`, and `ARCHITECTURE.md`.
2. Inspect `samples/` only as local fixture context; do not stage or commit sample files.
3. Define checks for page coverage, reading order, math renderability, delimiter normalization, table handling, asset links, metadata completeness, and warning counts.
4. Define `.json` metadata and `.report.md` expectations from the same source data.
5. Separate fast mocked checks from optional MinerU/model/GPU-dependent checks.
6. Update `PROGRESS.md` with the planned coverage and remaining sample gaps.
## Guardrails
- Do not copy sample PDFs into tracked files.
- Do not require GPU or large model downloads for the default fast verification loop.
+26
View File
@@ -0,0 +1,26 @@
---
description: Review core project documents for consistency with fixed decisions
argument-hint: [scope]
allowed-tools: [Read, Glob, Grep, Bash, Edit]
---
# /review-project-docs
Review project documents for contradictions, stale decisions, and missing constraints.
## Arguments
The user invoked this command with: $ARGUMENTS
## Workflow
1. Read `PLAN.md` and `PROGRESS.md`.
2. Read the requested document scope, defaulting to `AGENTS.md`, `PRD.md`, `ARCHITECTURE.md`, and `docs/KNOWLEDGEBASE.md`.
3. Check for contradictions against fixed decisions: MinerU 3.1.0 only, local-only, direct CLI execution, CLI-internal temporary local `mineru-api` allowed, no `--api-url` or remote API path, Python 3.12, uv, Obsidian Markdown, metadata JSON, and `.report.md`.
4. Report findings first with file and line references.
5. If edits are requested, make only surgical documentation changes and update `PROGRESS.md`.
## Guardrails
- Do not add speculative features, alternate engines, web UI, cloud OCR, or manual review queues.
- Do not rewrite unrelated prose while fixing one inconsistency.
+29
View File
@@ -0,0 +1,29 @@
---
description: Research current MinerU 3.1.0 facts for local integration planning
argument-hint: [research-question]
allowed-tools: [Read, Glob, Grep, Bash, WebFetch, Edit]
---
# /run-mineru-research
Research MinerU 3.1.0 facts that affect this project's documentation or future implementation.
## Arguments
The user invoked this command with: $ARGUMENTS
## Workflow
1. Read `PLAN.md`, `PROGRESS.md`, `ARCHITECTURE.md`, and `docs/KNOWLEDGEBASE.md`.
2. Use official MinerU documentation, the MinerU GitHub repository, primary papers, and official dependency documentation.
3. Verify facts that can change: install commands, supported Python/CUDA versions, CLI flags, output formats, model download behavior, and licenses.
4. Record sources with URLs and access dates when updating docs.
5. Keep findings scoped to MinerU 3.1.0; do not add candidate-engine comparisons.
6. Update `PROGRESS.md` with what was verified and what remains uncertain.
## Guardrails
- Allow only direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
- Do not add cloud OCR, hosted LLM, `--api-url`, remote API, router, HTTP client backend, or remote OpenAI-compatible backend paths.
- Do not turn research notes into implementation code.
- If official sources conflict, stop and ask for a decision instead of guessing.
+32
View File
@@ -0,0 +1,32 @@
---
description: Start a project task by loading shared plan and progress context
argument-hint: [agent-or-task]
allowed-tools: [Read, Glob, Grep, Bash, Edit]
---
# /start-agent-work
Start work in this repository with the project coordination protocol.
## Arguments
The user invoked this command with: $ARGUMENTS
## Workflow
1. Read `PLAN.md` and `PROGRESS.md`.
2. State the current goal, the next action, and any blocker that matters for the task.
3. Read only the additional source documents needed for the requested work.
4. If subagents are useful and the user explicitly asked for delegated agent work, choose the smallest set of `.codex/agents/*.toml` roles that covers the task.
5. For substantial implementation work, use the harness sequence: `harness-planner-agent` drafts the plan and contract, `feature-generator-agent` implements one agreed chunk, and `evaluation-agent` reviews the contract and completed work.
6. Do not implement converter code unless the user explicitly requests implementation.
7. After meaningful changes, update `PROGRESS.md`; update `PLAN.md` only when sequencing, decisions, ownership, or blockers change.
8. Run the smallest useful verification, check git status, and commit project changes while excluding `samples/`.
## Guardrails
- Keep MinerU 3.1.0 as the only conversion engine.
- Allow MinerU 3.1.0's CLI-internal temporary local `mineru-api`, but prohibit `--api-url`, remote APIs, router mode, HTTP client backends, and remote OpenAI-compatible backends.
- Keep runtime processing local-only.
- Keep `samples/` out of commits unless the user explicitly requests otherwise.
- Prefer official sources for changing facts about Codex, MinerU, Python, uv, CUDA, or licenses.
+8
View File
@@ -0,0 +1,8 @@
[features]
multi_agent = true
codex_hooks = true
[agents]
max_threads = 8
max_depth = 1
job_max_runtime_seconds = 3600
+53
View File
@@ -0,0 +1,53 @@
{
"hooks": {
"SessionStart": [
{
"matcher": "startup|resume|clear",
"hooks": [
{
"type": "command",
"command": "python \"$(git rev-parse --show-toplevel)/.codex/hooks/session_start_context.py\"",
"timeout": 10,
"statusMessage": "Loading project plan context"
}
]
}
],
"PreToolUse": [
{
"matcher": "^Bash$",
"hooks": [
{
"type": "command",
"command": "python \"$(git rev-parse --show-toplevel)/.codex/hooks/pre_tool_policy.py\"",
"timeout": 10,
"statusMessage": "Checking project shell policy"
}
]
},
{
"matcher": "^apply_patch$|Edit|Write",
"hooks": [
{
"type": "command",
"command": "python \"$(git rev-parse --show-toplevel)/.codex/hooks/pre_tool_policy.py\"",
"timeout": 10,
"statusMessage": "Checking project edit policy"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "python \"$(git rev-parse --show-toplevel)/.codex/hooks/stop_workspace_check.py\"",
"timeout": 10,
"statusMessage": "Checking project completion state"
}
]
}
]
}
}
+168
View File
@@ -0,0 +1,168 @@
"""Project guardrails for shell commands and apply_patch edits."""
from __future__ import annotations
import json
import re
import subprocess
import sys
from pathlib import Path
REMOTE_ENGINE_PATTERNS = [
"--api-url",
"router mode",
"http client mode",
"http client backend",
"http-client",
"remote api",
"remote endpoint",
"openai-compatible",
"openai compatible",
"mathpix",
"mistral ocr",
"nanonets",
]
DIRECT_SERVER_COMMAND_PATTERNS = [
r"(^|\s)mineru-api(\s|$)",
r"(^|\s)mineru-router(\s|$)",
]
ALLOWED_NEGATION_PATTERNS = [
"do not",
"never",
"exclude",
"excluded",
"non-goal",
"not use",
"no cloud",
"blocked",
"prohibit",
"prohibited",
"forbid",
"forbidden",
"reject",
"rejecting",
]
def read_payload() -> dict:
raw = sys.stdin.read()
if not raw.strip():
return {}
try:
return json.loads(raw)
except json.JSONDecodeError:
return {}
def deny(reason: str) -> int:
output = {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": reason,
}
}
print(json.dumps(output, ensure_ascii=True))
return 0
def find_repo_root(cwd: str | None) -> Path:
start = Path(cwd or Path.cwd()).resolve()
try:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=start,
capture_output=True,
text=True,
check=True,
)
return Path(result.stdout.strip()).resolve()
except Exception:
return start
def samples_are_untracked(root: Path) -> bool:
try:
result = subprocess.run(
["git", "status", "--porcelain", "--", "samples"],
cwd=root,
capture_output=True,
text=True,
check=True,
)
except Exception:
return False
return any(line.startswith("?? ") for line in result.stdout.splitlines())
def check_shell_command(command: str, root: Path) -> str | None:
normalized = command.replace("\\", "/").lower()
if re.search(r"\bgit\s+add\b.*(?:^|\s|/)samples(?:\s|/|$)", normalized):
return "Do not stage samples/ unless the user explicitly requests it."
stages_everything = re.search(r"\bgit\s+add\b", normalized) and re.search(
r"(\s\.($|\s)|\s-a($|\s)|\s--all($|\s))",
normalized,
)
if samples_are_untracked(root) and stages_everything:
return "Use path-specific git add commands; samples/ is untracked local fixture data."
destructive_samples = [
r"\bgit\s+clean\b.*\b-f\b.*(?:^|\s|/)samples(?:\s|/|$)",
r"\brm\s+.*-r[f]?\b.*(?:^|\s|/)samples(?:\s|/|$)",
r"\bremove-item\b.*-recurse\b.*(?:^|\s|/)samples(?:\s|/|$)",
r"\bgit\s+reset\s+--hard\b",
]
if any(re.search(pattern, normalized) for pattern in destructive_samples):
return "Destructive workspace or samples/ command blocked by project policy."
if any(re.search(pattern, normalized) for pattern in DIRECT_SERVER_COMMAND_PATTERNS):
return "Direct MinerU server/router commands are blocked; use the mineru CLI. CLI-internal temporary local mineru-api is allowed."
for pattern in REMOTE_ENGINE_PATTERNS:
if pattern in normalized:
return "Remote/API conversion paths are blocked; v1 must run MinerU 3.1.0 through the local CLI only."
return None
def check_patch(command: str) -> str | None:
for line in command.splitlines():
if not line.startswith("+") or line.startswith("+++"):
continue
lowered = line[1:].strip().lower()
if any(negation in lowered for negation in ALLOWED_NEGATION_PATTERNS):
continue
if any(pattern in lowered for pattern in REMOTE_ENGINE_PATTERNS):
return "Patch appears to add remote/API conversion behavior or excluded engine references."
if "runtime engine" in lowered and ("selection" in lowered or "switch" in lowered):
return "Runtime engine selection is out of scope for v1."
return None
def main() -> int:
payload = read_payload()
tool_name = payload.get("tool_name", "")
tool_input = payload.get("tool_input") or {}
command = str(tool_input.get("command") or tool_input.get("patch") or "")
root = find_repo_root(payload.get("cwd"))
if tool_name == "Bash":
reason = check_shell_command(command, root)
if reason:
return deny(reason)
if tool_name in {"apply_patch", "Edit", "Write"}:
reason = check_patch(command)
if reason:
return deny(reason)
return 0
if __name__ == "__main__":
raise SystemExit(main())
+63
View File
@@ -0,0 +1,63 @@
"""Inject the project coordination reminder at Codex session start."""
from __future__ import annotations
import json
import subprocess
import sys
from pathlib import Path
def read_payload() -> dict:
raw = sys.stdin.read()
if not raw.strip():
return {}
try:
return json.loads(raw)
except json.JSONDecodeError:
return {}
def find_repo_root(cwd: str | None) -> Path:
start = Path(cwd or Path.cwd()).resolve()
try:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=start,
capture_output=True,
text=True,
check=True,
)
return Path(result.stdout.strip()).resolve()
except Exception:
return start
def main() -> int:
payload = read_payload()
root = find_repo_root(payload.get("cwd"))
required = ["PLAN.md", "PROGRESS.md"]
missing = [name for name in required if not (root / name).exists()]
context = (
"Before starting work in this repository, read PLAN.md and PROGRESS.md. "
"Use PROGRESS.md as the factual state, update PLAN.md when sequencing changes, "
"and keep samples/ out of commits unless the user explicitly requests otherwise."
)
output = {
"continue": True,
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": context,
},
}
if missing:
output["systemMessage"] = "Missing project coordination file(s): " + ", ".join(missing)
print(json.dumps(output, ensure_ascii=True))
return 0
if __name__ == "__main__":
raise SystemExit(main())
+85
View File
@@ -0,0 +1,85 @@
"""Remind agents to verify and commit completed project-file changes."""
from __future__ import annotations
import json
import subprocess
import sys
from pathlib import Path
PROJECT_PREFIXES = (
".codex/",
"AGENTS.md",
"ARCHITECTURE.md",
"PLAN.md",
"PRD.md",
"PROGRESS.md",
"docs/",
)
def read_payload() -> dict:
raw = sys.stdin.read()
if not raw.strip():
return {}
try:
return json.loads(raw)
except json.JSONDecodeError:
return {}
def find_repo_root(cwd: str | None) -> Path:
start = Path(cwd or Path.cwd()).resolve()
try:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=start,
capture_output=True,
text=True,
check=True,
)
return Path(result.stdout.strip()).resolve()
except Exception:
return start
def project_changes(root: Path) -> list[str]:
try:
result = subprocess.run(
["git", "status", "--short"],
cwd=root,
capture_output=True,
text=True,
check=True,
)
except Exception:
return []
paths: list[str] = []
for line in result.stdout.splitlines():
path = line[3:].replace("\\", "/")
if path.startswith("samples/"):
continue
if path.startswith(PROJECT_PREFIXES):
paths.append(path)
return paths
def main() -> int:
payload = read_payload()
root = find_repo_root(payload.get("cwd"))
changes = project_changes(root)
if not changes:
return 0
message = (
"Project workflow/docs changed. Before finishing, run focused verification, "
"commit the completed change, and keep samples/ out of the commit."
)
print(json.dumps({"continue": True, "systemMessage": message}, ensure_ascii=True))
return 0
if __name__ == "__main__":
raise SystemExit(main())
+30
View File
@@ -0,0 +1,30 @@
---
name: fixture-evaluation
description: Plan local fixture-based quality checks for this MinerU PDF-to-Markdown converter using samples/ without committing sample PDFs. Use when Codex needs to define sample coverage, quality metrics, regression checks, JSON metadata assertions, or human-readable .report.md expectations.
---
# Fixture Evaluation
## Overview
Use this skill to turn local sample PDFs into a small, repeatable quality plan. Keep samples local and untracked unless the user explicitly asks to commit them.
## Workflow
1. Read `PLAN.md` and `PROGRESS.md` first.
2. Inspect `samples/` only enough to understand fixture categories and filenames.
3. Map each fixture to risks: math, tables, multi-column reading order, figures/assets, Korean filenames, and metadata coverage.
4. Separate fast checks using mocked MinerU outputs from optional checks that require MinerU models, GPU, or long execution.
5. Define metrics for both JSON metadata and `<stem>.report.md`.
6. Update `PROGRESS.md` with fixture coverage and gaps.
## Guardrails
- Do not commit sample PDFs.
- Do not copy samples into tracked fixtures without explicit user permission.
- Do not make GPU/model-dependent checks mandatory for the default fast loop.
- Do not grade only plain-text edit distance; include math, tables, reading order, assets, metadata, and renderability.
## Reference
Read `references/evaluation-metrics.md` when defining fixture coverage, regression criteria, or report fields.
@@ -0,0 +1,4 @@
interface:
display_name: "Fixture Evaluation"
short_description: "Plan fixture quality checks locally"
default_prompt: "Use $fixture-evaluation to plan sample coverage, quality metrics, regression checks, and report expectations without committing sample files."
@@ -0,0 +1,37 @@
# Evaluation Metrics
Use these metrics for local fixture plans and future tests.
## Fixture Categories
- Simple digital PDF with text layer.
- Math-heavy paper or chapter.
- Multi-column paper.
- Table with formulas.
- Figure with caption and asset extraction.
- Korean filename/path handling.
## Fast Checks
- Output files are planned at deterministic paths.
- Metadata JSON includes source PDF, page count, engine, warnings, and output paths.
- `.report.md` can be generated from metadata without re-running MinerU.
- Markdown math delimiter normalization is deterministic.
- Asset links resolve relative to the Markdown file.
## Optional MinerU Checks
- MinerU CLI execution succeeds or produces a clear failure warning.
- Page coverage equals source PDF page count.
- Math renderability failures are counted.
- Table degradation warnings are counted.
- Reading-order uncertainty is surfaced.
## Report Sections
- Summary: source file, pages, output files, engine, start/end time.
- Warnings: grouped by severity and code.
- Math: counts for inline, display, low-confidence, and render failures.
- Assets: extracted, missing, broken links.
- Tables: extracted, degraded, fallback count.
- Environment: Python, uv, MinerU version, GPU visibility when available.
@@ -0,0 +1,31 @@
---
name: math-markdown-review
description: Review and design Obsidian-friendly Markdown normalization for math-heavy PDF conversion, including LaTeX delimiters, display math spacing, asset links, tables, and quality report warnings. Use when Codex needs to check Markdown output assumptions, design post-processing rules, or define renderability checks for formulas and assets.
---
# Math Markdown Review
## Overview
Use this skill when Markdown output quality matters more than raw text extraction. The goal is best-effort automatic conversion with explicit warnings and provenance for failures.
## Workflow
1. Read `PLAN.md` and `PROGRESS.md` first.
2. Read `PRD.md` and `ARCHITECTURE.md` when output behavior, metadata, or reporting is affected.
3. Preserve project delimiter policy: inline math uses `$...$`; display math uses `$$...$$`.
4. Check asset links, table fallback behavior, heading/list interactions, and page boundary markers against Obsidian rendering assumptions.
5. Define warnings for low-confidence math, non-renderable LaTeX, broken asset links, table degradation, and reading-order uncertainty.
6. Ensure `.report.md` content is derived from metadata, not separate manual state.
## Checks
- Inline math should not contain unescaped newlines or surrounding spaces that break rendering.
- Display math should be separated from surrounding paragraphs by blank lines.
- Asset paths should be stable, relative to the Markdown file, and safe for Obsidian vaults.
- Tables with formulas should prefer readable Markdown when reliable and warn when downgraded.
- Every renderability failure should be countable in metadata and visible in `.report.md`.
## Reference
Read `references/obsidian-output-checks.md` for concrete normalization and report-signal guidance.
@@ -0,0 +1,4 @@
interface:
display_name: "Math Markdown Review"
short_description: "Check Obsidian math Markdown output"
default_prompt: "Use $math-markdown-review to design or check Obsidian-friendly Markdown normalization, math delimiters, asset paths, tables, and quality report signals."
@@ -0,0 +1,31 @@
# Obsidian Output Checks
Use these checks when designing or reviewing Markdown output.
## Math
- Inline math: `$...$`, no line breaks inside the delimiter pair.
- Display math: `$$...$$`, with blank lines before and after the block.
- Preserve source provenance for formulas: page index, bbox if available, engine, confidence, and warning codes.
- Record render failures separately from extraction confidence.
- Avoid rewriting LaTeX semantics unless the rule is deterministic and tested.
## Assets
- Store images under a deterministic asset directory next to the Markdown output.
- Use relative Markdown links that remain valid when the output directory is moved as a unit.
- Record asset source page, bbox if available, generated file path, and missing-link warnings.
## Tables
- Prefer Markdown tables only when cell boundaries and reading order are reliable.
- If formulas or merged cells make Markdown tables misleading, use a readable fallback and emit a table warning.
- Keep table warnings visible in both JSON metadata and `.report.md`.
## Report Signals
- Total pages processed and pages with warnings.
- Math block count, inline math count, and non-renderable math count.
- Broken asset links and missing assets.
- Table degradation count.
- Reading-order uncertainty count.
+32
View File
@@ -0,0 +1,32 @@
---
name: mineru-research
description: Research MinerU 3.1.0 setup, CLI behavior, output formats, model/runtime requirements, licensing, and local-only integration constraints for this PDF-to-Markdown project. Use when Codex needs to update project knowledge, verify MinerU facts, plan the MinerU adapter, or resolve uncertainty about installation, execution, or output behavior without adding alternate engines.
---
# MinerU Research
## Overview
Use this skill to verify MinerU 3.1.0 facts before changing project docs or plans. Keep the scope narrow: MinerU 3.1.0 is the only conversion engine and direct local CLI execution is the only v1 execution mode.
## Workflow
1. Read `PLAN.md` and `PROGRESS.md` first.
2. Read `PRD.md`, `ARCHITECTURE.md`, and `docs/KNOWLEDGEBASE.md` when the change affects product or architecture decisions.
3. Prefer official MinerU documentation, the MinerU GitHub repository, release notes, primary papers, and official dependency docs.
4. Verify time-sensitive facts with web research before updating docs.
5. Record source URLs and access dates in durable docs when the finding affects future implementation.
6. Update `PROGRESS.md` with the verified fact, unresolved uncertainty, and next action.
## Constraints
- Do not reintroduce candidate engine comparisons.
- Allow only direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
- Do not add cloud OCR, remote LLM, `--api-url`, remote API, router, HTTP client backend, or remote OpenAI-compatible backend paths.
- Do not imply perfect LaTeX reconstruction.
- Do not implement converter code unless the user explicitly requests implementation.
- Treat GTX 1070 Ti 8GB, Python 3.12, uv, and Windows PowerShell as active project constraints.
## Reference
Read `references/source-checklist.md` when planning a research pass or updating source-backed documentation.
@@ -0,0 +1,4 @@
interface:
display_name: "MinerU Research"
short_description: "Verify MinerU local integration facts"
default_prompt: "Use $mineru-research to verify MinerU 2.5 setup, CLI behavior, outputs, licensing, and local-only integration constraints against official sources."
@@ -0,0 +1,29 @@
# MinerU Research Source Checklist
Use this checklist before changing project docs or plans based on MinerU facts.
## Sources
- MinerU GitHub repository for install instructions, CLI examples, output behavior, and license files.
- MinerU official documentation for current setup and execution modes.
- MinerU release notes or tags for version-specific changes.
- Primary papers for model capability claims.
- Official Python, uv, CUDA, PyTorch, or dependency docs for environment compatibility.
## Facts To Verify
- Supported Python versions and package manager expectations.
- Whether MinerU 3.1.0 supports the required local CLI path on Windows.
- Whether MinerU 3.1.0's CLI-internal temporary local `mineru-api` behavior stays local and avoids `--api-url`.
- Required model download/cache behavior and offline reuse assumptions.
- GPU/CPU execution options and expected memory pressure for GTX 1070 Ti 8GB.
- Output directory structure, Markdown output, image asset output, JSON/intermediate output, and page/block metadata availability.
- Exit codes, error messages, logging behavior, and partial-output behavior.
- License obligations for MinerU, bundled models, and transitive runtime packages.
## Recording Rules
- Record source URL and access date for durable claims.
- Distinguish official fact from inference.
- Keep alternate engine names out of project docs unless the user explicitly asks for a separate historical note.
- If a source conflicts with a fixed product decision, record the conflict and ask for a user decision.