add files
This commit is contained in:
@@ -0,0 +1,23 @@
|
||||
---
|
||||
name: conversion-architecture
|
||||
description: Design PDFtoMD conversion architecture, parser boundaries, internal block models, chunk policy, renderer contracts, output structure, logging, and resume behavior. Use when planning or reviewing conversion engine design.
|
||||
---
|
||||
|
||||
# Conversion Architecture
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, `docs/ARCHITECTURE.md`, `docs/CONVERSION_POLICY.md`, and `docs/ADR.md`.
|
||||
2. Keep responsibilities stable:
|
||||
- Marker: layout, OCR, reading order, body, headings, tables, figures, captions
|
||||
- Nougat: formula-only LaTeX parsing
|
||||
- PyMuPDF: page pre-analysis, text-layer quality, page counts, chunk planning
|
||||
3. Define interfaces and invariants before implementation.
|
||||
4. Keep output deterministic and chunked under the documented output contract.
|
||||
5. Record architecture changes in `docs/ADR.md` when decisions change.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not place conversion logic in a future PyQt UI.
|
||||
- Do not add document sidecars unless explicitly requested.
|
||||
- Do not let chunking split a paragraph, table, figure, or formula without a fallback plan.
|
||||
Reference in New Issue
Block a user