modify pdftomd

This commit is contained in:
김경종
2026-05-14 10:16:59 +09:00
parent 2232b51fc9
commit dc11880140
69 changed files with 7784 additions and 1150 deletions
+8 -9
View File
@@ -76,13 +76,13 @@ This optional pytest path runs `pdf2md doctor` first. If doctor has a hard failu
A sample conversion is successful only when all of these are true:
- The command exits 0.
- The planned Markdown file exists: `<output>\<stem>.md`.
- The planned metadata JSON exists: `<output>\<stem>.metadata.json`.
- The planned quality report exists: `<output>\<stem>.report.md`.
- Metadata and report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
- The planned Markdown part exists: `<output>\<stem>\<stem>_001.md`.
- The planned quality report exists: `<output>\<stem>\<stem>_report.md`.
- No public `.metadata.json` sidecar is written for new conversions.
- The report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
- Any Markdown image links resolve relative to the Markdown file, or missing/broken links are reported as warnings.
Missing Markdown, metadata JSON, or `.report.md` means the sample failed or is blocked. Do not count it as a partial success for release gating.
Missing Markdown part or `_report.md` means the sample failed or is blocked. Do not count it as a partial success for release gating.
For each attempted sample, record at least:
@@ -90,8 +90,7 @@ For each attempted sample, record at least:
- Command run.
- Exit code.
- Generated Markdown path.
- Generated metadata JSON path.
- Generated `.report.md` path.
- Generated `_report.md` path.
- Warning count and final status.
- Math renderability failures or checker-unavailable count.
- Table fallback or degradation count when available.
@@ -110,7 +109,7 @@ Local fixture coverage should include these risk categories where samples are av
- Figure, caption, or extracted asset links.
- Korean or non-ASCII filename/path handling.
Observed local fixture map as of 2026-05-08:
Observed local fixture map as of 2026-05-11:
| Local sample | Fixture risks covered | Notes |
| --- | --- | --- |
@@ -126,7 +125,7 @@ Coverage gaps to keep visible:
- A table-dominant sample with known formula cells would make table degradation easier to judge.
- A figure-heavy sample with expected extracted assets would make asset link validation easier to judge.
Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, metadata fields, warning usefulness, and `.report.md` usefulness.
Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, report provenance, warning usefulness, and `_report.md` usefulness.
## No-Sample-Commit Check