Step 2: header-footer-filtering

Read First

Detect repeated page headers, footers, and page numbers and separate them from the main Markdown body flow.

The implementation should mark or remove repetitive boilerplate according to policy while keeping enough diagnostics for review.

Done means: repeated top/bottom page-region text can be identified and excluded from main content in tests.
Hard thresholds: unique body text is not removed; page number patterns are tested; removal decisions are deterministic.
Files owned: src/pdftomd/enrichment.py, tests, PROGRESS.md, phase index.
Dependencies: Paragraph and block model from earlier steps.

python scripts\validate_workspace.py
.\venv\python.exe -m pytest tests