remove files
This commit is contained in:
@@ -1,23 +0,0 @@
|
||||
---
|
||||
name: conversion-architecture
|
||||
description: Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design.
|
||||
---
|
||||
|
||||
# Conversion Architecture
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/ARCHITECTURE.md`, `docs/CONVERSION_POLICY.md`, and `docs/ADR.md`.
|
||||
2. Keep responsibilities stable:
|
||||
- Marker: layout, OCR, reading order, body, headings, tables, figures, captions
|
||||
- Nougat: formula-only LaTeX parsing
|
||||
- PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning
|
||||
3. Define interfaces and invariants before implementation.
|
||||
4. Keep output deterministic and chunked under the documented output contract.
|
||||
5. Record architecture changes in `docs/ADR.md` when decisions change.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not place conversion logic in a future PyQt UI.
|
||||
- Do not add document sidecars unless explicitly requested.
|
||||
- Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "Conversion Architecture"
|
||||
short_description: "Plan parser and renderer boundaries"
|
||||
default_prompt: "Use $conversion-architecture to design the next PDFtoMD engine phase."
|
||||
@@ -1,24 +0,0 @@
|
||||
---
|
||||
name: formula-quality
|
||||
description: Plan and review formula extraction quality for PDFtoMD. Use when Codex needs Nougat handoff rules, inline/block formula classification, LaTeX delimiter checks, equation numbering, reference anchors, or Marker fallback behavior.
|
||||
---
|
||||
|
||||
# Formula Quality
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `docs/CONVERSION_POLICY.md`, `docs/TOOLCHAIN.md`, and `docs/ADR.md`.
|
||||
2. Identify formula candidates from Marker equation blocks or mathematical text patterns.
|
||||
3. Classify formulas as inline or block based on layout context.
|
||||
4. Validate:
|
||||
- `$ ... $` and `$$ ... $$` balance
|
||||
- `\begin{...}` / `\end{...}` pairs
|
||||
- formula numbering
|
||||
- body references such as `Eq. (3)` or Korean equation references
|
||||
5. Use Marker source text as fallback when Nougat fails.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not pass whole documents through Nougat as the primary parser.
|
||||
- Do not discard formula text on parse failure.
|
||||
- Do not rewrite references as links unless the target confidence is sufficient.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "Formula Quality"
|
||||
short_description: "Validate equations and LaTeX output"
|
||||
default_prompt: "Use $formula-quality to design formula parsing tests and fallback behavior."
|
||||
@@ -1,27 +0,0 @@
|
||||
---
|
||||
name: markdown-quality
|
||||
description: Plan and review Markdown output quality for PDFtoMD. Use when Codex needs tests or policies for headings, tables, HTML fallback, image links, captions, frontmatter, chunk integrity, and deterministic output.
|
||||
---
|
||||
|
||||
# Markdown Quality
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `docs/PRD.md`, `docs/ARCHITECTURE.md`, and `docs/CONVERSION_POLICY.md`.
|
||||
2. Prefer focused assertions over full snapshots.
|
||||
3. Validate:
|
||||
- heading hierarchy
|
||||
- table parseability
|
||||
- limited HTML table fallback
|
||||
- image link existence
|
||||
- figure/table captions
|
||||
- internal references
|
||||
- chunk frontmatter
|
||||
- deterministic filenames and anchors
|
||||
4. Use Markdown or HTML parsers when practical.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not inject runtime warnings into generated Markdown.
|
||||
- Do not rely only on brittle whole-file snapshots.
|
||||
- Do not lose complex table content without linking a fallback asset.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "Markdown Quality"
|
||||
short_description: "Check chunk Markdown and assets"
|
||||
default_prompt: "Use $markdown-quality to plan focused Markdown output validation."
|
||||
@@ -1,23 +0,0 @@
|
||||
---
|
||||
name: pdf-toolchain
|
||||
description: Research and maintain PDFtoMD toolchain compatibility for Marker, Nougat, PyMuPDF, PyTorch/CUDA, model cache, and licensing. Use when Codex needs dependency pins, runtime compatibility checks, official-source research, or updates to docs/TOOLCHAIN.md and related ADRs.
|
||||
---
|
||||
|
||||
# PDF Toolchain
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/TOOLCHAIN.md`, `docs/ARCHITECTURE.md`, and `docs/ADR.md`.
|
||||
2. Prefer official or primary sources for current facts.
|
||||
3. Verify local facts with commands when relevant:
|
||||
- `.\venv\python.exe -m pip check`
|
||||
- `.\venv\python.exe -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"`
|
||||
- `.\venv\Scripts\nougat.exe --help`
|
||||
4. Preserve the verified GTX 1070 Ti baseline unless a replacement is tested.
|
||||
5. Update `docs/TOOLCHAIN.md` and `docs/ADR.md` when dependency decisions change.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not upgrade `torch`, `transformers`, `albumentations`, `pypdfium2`, `opencv-python-headless`, `Pillow`, or `fsspec` without re-running compatibility checks.
|
||||
- Do not switch the primary parser away from Marker without an ADR update.
|
||||
- Do not download model weights unless the user explicitly asks.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "PDF Toolchain"
|
||||
short_description: "PDF parser and CUDA dependency guidance"
|
||||
default_prompt: "Use $pdf-toolchain to verify PDFtoMD dependency compatibility and update toolchain notes."
|
||||
@@ -1,27 +0,0 @@
|
||||
---
|
||||
name: sample-corpus
|
||||
description: Analyze and maintain the PDFtoMD samples corpus. Use when Codex needs to classify samples/ PDFs, design samples/metadata.json, identify OCR candidates, or connect corpus traits to focused regression tests.
|
||||
---
|
||||
|
||||
# Sample Corpus
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/PRD.md`, and `docs/CONVERSION_POLICY.md`.
|
||||
2. Inspect PDFs with PyMuPDF before proposing tests.
|
||||
3. Track these traits per PDF:
|
||||
- page count
|
||||
- text-layer quality
|
||||
- scanned or mixed pages
|
||||
- multi-column layout
|
||||
- formula density
|
||||
- table density
|
||||
- figure density
|
||||
- Korean filename/path coverage
|
||||
4. If writing metadata, use `samples/metadata.json` and update `PROGRESS.md`.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Preserve original sample PDFs.
|
||||
- Do not rename Korean sample files unless the user explicitly asks.
|
||||
- Do not treat first-page text length as the only OCR signal.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "Sample Corpus"
|
||||
short_description: "Classify PDF samples for quality tests"
|
||||
default_prompt: "Use $sample-corpus to audit samples/ PDFs and propose regression metadata."
|
||||
@@ -1,23 +0,0 @@
|
||||
---
|
||||
name: windows-runtime
|
||||
description: Maintain Windows-native PDFtoMD runtime behavior. Use when Codex needs guidance for repo-local venv, CUDA/OOM handling, Korean paths, long paths, model cache, offline operation, stderr logs, or resume cache behavior.
|
||||
---
|
||||
|
||||
# Windows Runtime
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `docs/TOOLCHAIN.md`, `docs/ARCHITECTURE.md`, and `docs/CONVERSION_POLICY.md`.
|
||||
2. Verify environment health with:
|
||||
- `.\venv\python.exe -m pip check`
|
||||
- CUDA smoke test
|
||||
- `.\venv\Scripts\nougat.exe --help`
|
||||
3. Use `pathlib` for path design and tests.
|
||||
4. Include Korean filenames, spaces, and long Windows paths in test plans.
|
||||
5. Keep model cache and offline behavior explicit.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not silently fall back to CPU when the user explicitly requested CUDA.
|
||||
- Do not choose batch sizes that assume more than 8 GB VRAM.
|
||||
- Do not delete local environments or sample PDFs without explicit approval.
|
||||
@@ -1,4 +0,0 @@
|
||||
interface:
|
||||
display_name: "Windows Runtime"
|
||||
short_description: "Windows, CUDA, paths, and offline checks"
|
||||
default_prompt: "Use $windows-runtime to verify PDFtoMD local runtime assumptions."
|
||||
Reference in New Issue
Block a user