Files
PDFToMD/phases/5-markdown-rendering-assets/step3.md
T
김경종 7e985ae94a add files
2026-04-30 17:05:19 +09:00

1.3 KiB

Step 3: chunk-renderer

Read First

  • /AGENTS.md
  • /PLAN.md
  • /PROGRESS.md
  • /docs/HARNESS.md
  • /docs/IMPLEMENTATION_PLAN.md
  • /docs/ARCHITECTURE.md
  • /docs/CONVERSION_POLICY.md
  • /phases/1-core-runtime-contracts/step2.md
  • /phases/5-markdown-rendering-assets/step2.md

Task

Implement chunk planning and chunk Markdown bundle writing over enriched blocks.

Chunk boundaries should target 20 pages but preserve logical block integrity for paragraphs, tables, figures, and formulas.

Sprint Contract

  • Done means: chunk files with frontmatter can be written deterministically from internal document fixtures.
  • Hard thresholds: block integrity is preserved at chunk boundaries; chunk frontmatter includes minimum context; quality gates run on rendered chunks.
  • Files owned: src/pdftomd/chunking.py, src/pdftomd/renderer.py, tests, PROGRESS.md, phase index.
  • Dependencies: Renderer, assets, and output bundle contracts.

Acceptance Criteria

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests

Verification

  1. Run the acceptance commands.
  2. Confirm long-document chunk fixtures cover boundary behavior.
  3. Update PROGRESS.md and this phase index.

Do Not

  • Do not split blocks in the middle to satisfy exact 20-page counts.
  • Do not create document sidecar metadata files.
  • Do not implement CLI orchestration here.