28 lines
857 B
Markdown
28 lines
857 B
Markdown
---
|
|
description: Audit samples/ PDFs for page counts, text-layer quality, images, and OCR candidates.
|
|
argument-hint: [pdf-glob-or-empty]
|
|
allowed-tools: [Read, Glob, Bash, Write, Edit]
|
|
---
|
|
|
|
# /sample-audit
|
|
|
|
## Arguments
|
|
|
|
The user invoked this command with: $ARGUMENTS
|
|
|
|
## Workflow
|
|
|
|
1. Read `AGENTS.md`, `PLAN.md`, `PROGRESS.md`, and `docs/CONVERSION_POLICY.md`.
|
|
2. Use PyMuPDF from `.\venv` to inspect matching `samples/*.pdf` files.
|
|
3. Report page count, first-page text length, image counts, suspected scan/OCR pages, Korean filename coverage, and obvious layout risks.
|
|
4. If the user asks to write metadata, create or update `samples/metadata.json`; otherwise only report.
|
|
5. Update `PROGRESS.md` when files are changed.
|
|
|
|
## Output
|
|
|
|
- **Corpus Summary**
|
|
- **Per-PDF Traits**
|
|
- **OCR Candidates**
|
|
- **Test Implications**
|
|
- **Recommended Metadata Changes**
|