1.3 KiB
1.3 KiB
Step 3: chunk-renderer
Read First
- /AGENTS.md
- /PLAN.md
- /PROGRESS.md
- /docs/HARNESS.md
- /docs/IMPLEMENTATION_PLAN.md
- /docs/ARCHITECTURE.md
- /docs/CONVERSION_POLICY.md
- /phases/1-core-runtime-contracts/step2.md
- /phases/5-markdown-rendering-assets/step2.md
Task
Implement chunk planning and chunk Markdown bundle writing over enriched blocks.
Chunk boundaries should target 20 pages but preserve logical block integrity for paragraphs, tables, figures, and formulas.
Sprint Contract
- Done means: chunk files with frontmatter can be written deterministically from internal document fixtures.
- Hard thresholds: block integrity is preserved at chunk boundaries; chunk frontmatter includes minimum context; quality gates run on rendered chunks.
- Files owned:
src/pdftomd/chunking.py,src/pdftomd/renderer.py, tests,PROGRESS.md, phase index. - Dependencies: Renderer, assets, and output bundle contracts.
Acceptance Criteria
python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests
Verification
- Run the acceptance commands.
- Confirm long-document chunk fixtures cover boundary behavior.
- Update
PROGRESS.mdand this phase index.
Do Not
- Do not split blocks in the middle to satisfy exact 20-page counts.
- Do not create document sidecar metadata files.
- Do not implement CLI orchestration here.