add pdftomd

This commit is contained in:
김경종
2026-05-08 16:42:19 +09:00
parent 551ab50735
commit 88d6b92283
99 changed files with 47332 additions and 0 deletions
+32
View File
@@ -0,0 +1,32 @@
---
name: mineru-research
description: Research MinerU 3.1.0 setup, CLI behavior, output formats, model/runtime requirements, licensing, and local-only integration constraints for this PDF-to-Markdown project. Use when Codex needs to update project knowledge, verify MinerU facts, plan the MinerU adapter, or resolve uncertainty about installation, execution, or output behavior without adding alternate engines.
---
# MinerU Research
## Overview
Use this skill to verify MinerU 3.1.0 facts before changing project docs or plans. Keep the scope narrow: MinerU 3.1.0 is the only conversion engine and direct local CLI execution is the only v1 execution mode.
## Workflow
1. Read `PLAN.md` and `PROGRESS.md` first.
2. Read `PRD.md`, `ARCHITECTURE.md`, and `docs/KNOWLEDGEBASE.md` when the change affects product or architecture decisions.
3. Prefer official MinerU documentation, the MinerU GitHub repository, release notes, primary papers, and official dependency docs.
4. Verify time-sensitive facts with web research before updating docs.
5. Record source URLs and access dates in durable docs when the finding affects future implementation.
6. Update `PROGRESS.md` with the verified fact, unresolved uncertainty, and next action.
## Constraints
- Do not reintroduce candidate engine comparisons.
- Allow only direct `mineru` CLI execution and the CLI-internal temporary local `mineru-api` process.
- Do not add cloud OCR, remote LLM, `--api-url`, remote API, router, HTTP client backend, or remote OpenAI-compatible backend paths.
- Do not imply perfect LaTeX reconstruction.
- Do not implement converter code unless the user explicitly requests implementation.
- Treat GTX 1070 Ti 8GB, Python 3.12, uv, and Windows PowerShell as active project constraints.
## Reference
Read `references/source-checklist.md` when planning a research pass or updating source-backed documentation.
@@ -0,0 +1,4 @@
interface:
display_name: "MinerU Research"
short_description: "Verify MinerU local integration facts"
default_prompt: "Use $mineru-research to verify MinerU 2.5 setup, CLI behavior, outputs, licensing, and local-only integration constraints against official sources."
@@ -0,0 +1,29 @@
# MinerU Research Source Checklist
Use this checklist before changing project docs or plans based on MinerU facts.
## Sources
- MinerU GitHub repository for install instructions, CLI examples, output behavior, and license files.
- MinerU official documentation for current setup and execution modes.
- MinerU release notes or tags for version-specific changes.
- Primary papers for model capability claims.
- Official Python, uv, CUDA, PyTorch, or dependency docs for environment compatibility.
## Facts To Verify
- Supported Python versions and package manager expectations.
- Whether MinerU 3.1.0 supports the required local CLI path on Windows.
- Whether MinerU 3.1.0's CLI-internal temporary local `mineru-api` behavior stays local and avoids `--api-url`.
- Required model download/cache behavior and offline reuse assumptions.
- GPU/CPU execution options and expected memory pressure for GTX 1070 Ti 8GB.
- Output directory structure, Markdown output, image asset output, JSON/intermediate output, and page/block metadata availability.
- Exit codes, error messages, logging behavior, and partial-output behavior.
- License obligations for MinerU, bundled models, and transitive runtime packages.
## Recording Rules
- Record source URL and access date for durable claims.
- Distinguish official fact from inference.
- Keep alternate engine names out of project docs unless the user explicitly asks for a separate historical note.
- If a source conflicts with a fixed product decision, record the conflict and ask for a user decision.