remove files

2026-05-08 16:31:17 +09:00
parent 7e985ae94a
commit 551ab50735
135 changed files with 0 additions and 41205 deletions
@@ -1,63 +0,0 @@
-# PDFtoMD
-
-PDFtoMD는 수학, 공학, 역학 중심 PDF를 AI Agent가 읽기 쉬운 Markdown 문서 묶음으로 변환하는 로컬 우선 변환 엔진입니다.
-
-목표는 단순 텍스트 추출이 아니라 원문 문서의 읽기 순서, 문단 흐름, 수식, 표, 그림, 캡션, 본문 참조를 보존한 구조화 변환입니다.
-
-## Status
- Current phase: Harness foundation planning.
- Implementation: not started.
- Primary target: Windows 10 native CLI/library engine.
- UI: future PyQt thin client.
-
-## Core Direction
- Marker handles document structure, reading order, OCR/layout, body text, tables, figures, headings, and captions.
- Nougat handles only mathematical expressions and formula blocks.
- PyMuPDF handles lightweight page analysis, text-layer quality checks, page counts, chunk planning, and low-level PDF operations.
- Mixed text/scanned PDFs are in scope.
- Output is chunked Markdown plus image/table assets under a document slug directory.
-
-## Environment
-Use one repo-local Python 3.11 environment.
-
-```powershell
-conda create -p .\venv python=3.11 -y
-.\venv\python.exe -m pip install -r requirements.txt
-```
-
-Verified local baseline:
- Windows 10
- NVIDIA GeForce GTX 1070 Ti, 8 GB VRAM
- NVIDIA driver 577.00
- PyTorch `2.7.1+cu126`
- Marker `1.10.2`
- Nougat OCR `0.1.17`
-
-## Verification
-```powershell
-python scripts\validate_workspace.py
-.\venv\python.exe -m pip check
-.\venv\python.exe -c "import torch; x=torch.ones((1,), device='cuda'); print(torch.__version__, torch.version.cuda, x.item())"
-.\venv\Scripts\nougat.exe --help
-```
-
-`scripts/validate_workspace.py` now discovers repo-local Python validation by default. It prefers `.\venv\python.exe`, compiles Harness scripts, and runs `scripts/test_*.py` with pytest unless `HARNESS_VALIDATION_COMMANDS` or npm scripts override discovery.
-
-## Important Documents
- `AGENTS.md`: persistent repository instructions.
- `PLAN.md`: multi-agent planning state.
- `PROGRESS.md`: multi-agent progress state.
- `phases/`: executable Harness phase tickets.
- `docs/PRD.md`: product requirements.
- `docs/ARCHITECTURE.md`: engine architecture.
- `docs/CONVERSION_POLICY.md`: detailed conversion decisions.
- `docs/HARNESS.md`: planner/generator/evaluator Harness workflow.
- `docs/IMPLEMENTATION_PLAN.md`: full phase-by-phase implementation roadmap.
- `docs/ADR.md`: architecture decision records.
- `docs/TOOLCHAIN.md`: toolchain and dependency notes.
- `docs/UI_GUIDE.md`: future PyQt UI guidance.
-
-## Sample Corpus
-The `samples/` directory is used for quality evaluation and regression tests. Current sample PDFs include Korean filenames, engineering/mechanics documents, formulas, figures, and a long 76-page document.
-
-Before implementation, create a sample metadata mapping file that tags each PDF by text-layer quality, scanned pages, multi-column layout, formula density, table density, figure density, and Korean filename coverage.