modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
+17 -1
View File
@@ -72,6 +72,16 @@ Strong success criteria let you loop independently. Weak criteria ("make it work
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
## Commands
| Command | Description |
| --- | --- |
| `uv run pytest` | Run the default fast test suite. |
| `uv run pdf2md doctor` | Check local Python, uv, MinerU, GPU/PyTorch, model/cache, MathJax, and strict-local setup. |
| `uv run pytest tests/test_ui_runner.py` | Run focused UI command-resolution and subprocess tests. |
| `uv run --group ui-build pyinstaller --clean --onefile --windowed --name pdf2md-ui src\pdf2md_ui\app.py` | Rebuild the thin Windows UI executable. |
| `uv run pdf2md convert paper.pdf --out outputs --chunk-pages --gpu auto --mineru-profile auto --strict-local` | Optional local conversion smoke; keep generated output ignored. |
## Source Documents
- `PLAN.md`: shared plan, planned work, open questions, and ownership for agents.
@@ -80,8 +90,11 @@ Strong success criteria let you loop independently. Weak criteria ("make it work
- `ARCHITECTURE.md`: system layers, MinerU adapter contract, intermediate representation, metadata schema, and local-only enforcement.
- `docs/KNOWLEDGEBASE.md`: research basis and implementation background.
- `docs/V1IMPLEMENTATIONPLAN.md`: v1 implementation sequence, sprint contracts, verification gates, and agent ownership.
- `docs/UI_RESEARCH.md`: research basis for the implemented minimal Windows UI launcher.
- `docs/WORKARCHIVE.md`: archived completed work, historical sprint outcomes, setup results, verification history, and sample conversion evidence.
- `docs/Sprints/*.md`: active and historical sprint contracts.
- `docs/superpowers/specs/*.md`: design specs created for focused project workflows.
- `docs/superpowers/plans/*.md`: executable task plans created from specs, including completed UI folder batch work and abandoned historical plans.
- `.codex/agents/*.toml`: project-scoped custom subagent roles.
- `.codex/commands/*.md`: reusable project prompt commands.
- `.codex/skills/*/SKILL.md`: project-specific Codex skills.
@@ -155,7 +168,8 @@ Periodically re-evaluate the harness itself. Remove roles, contracts, or checks
- Input priority: digital PDFs with text layers.
- Quality workflow: fully automatic. Log warnings and continue when possible.
- MinerU execution: direct local `mineru` CLI only. MinerU 3.1.0 may launch a temporary local `mineru-api` internally when CLI runs without `--api-url`.
- Quality report: write both metadata JSON and `<stem>.report.md`.
- Output layout: write `<out>/<stem>/<stem>_001.md`, shared `<out>/<stem>/images/`, and `<out>/<stem>/<stem>_report.md`; new conversions do not persist public metadata JSON after Sprint 16.
- UI folder batch conversion: the UI may convert direct-child PDFs in a selected folder by sequentially invoking existing `pdf2md convert` commands.
- v1 use case: personal/research. MinerU and transitive model/package licenses must be documented before redistribution.
## Architecture Guidance
@@ -217,6 +231,8 @@ After changing files:
- Check `git status --short`.
- Commit the completed change unless the user explicitly asks not to.
- Do not include unrelated user edits in the commit.
- Commit rollback requests - Verify the target commit and current status first, then use a direct non-interactive reset; leave untracked generated/local artifacts such as `build/`, `dist/`, `samples/`, and `*.spec` files untouched unless deletion is explicitly requested.
- Installed-runtime doctor debugging - Test both `uv run pdf2md doctor` and direct venv execution such as `.venv\Scripts\pdf2md.exe doctor`; direct execution may not inherit the same PATH behavior as `uv run`.
## Documentation Guidance