modify pdftomd
This commit is contained in:
@@ -76,13 +76,13 @@ This optional pytest path runs `pdf2md doctor` first. If doctor has a hard failu
|
||||
A sample conversion is successful only when all of these are true:
|
||||
|
||||
- The command exits 0.
|
||||
- The planned Markdown file exists: `<output>\<stem>.md`.
|
||||
- The planned metadata JSON exists: `<output>\<stem>.metadata.json`.
|
||||
- The planned quality report exists: `<output>\<stem>.report.md`.
|
||||
- Metadata and report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
|
||||
- The planned Markdown part exists: `<output>\<stem>\<stem>_001.md`.
|
||||
- The planned quality report exists: `<output>\<stem>\<stem>_report.md`.
|
||||
- No public `.metadata.json` sidecar is written for new conversions.
|
||||
- The report warning counts are consistent enough to explain math, table, reading-order, asset, MinerU, and checker-unavailable risks.
|
||||
- Any Markdown image links resolve relative to the Markdown file, or missing/broken links are reported as warnings.
|
||||
|
||||
Missing Markdown, metadata JSON, or `.report.md` means the sample failed or is blocked. Do not count it as a partial success for release gating.
|
||||
Missing Markdown part or `_report.md` means the sample failed or is blocked. Do not count it as a partial success for release gating.
|
||||
|
||||
For each attempted sample, record at least:
|
||||
|
||||
@@ -90,8 +90,7 @@ For each attempted sample, record at least:
|
||||
- Command run.
|
||||
- Exit code.
|
||||
- Generated Markdown path.
|
||||
- Generated metadata JSON path.
|
||||
- Generated `.report.md` path.
|
||||
- Generated `_report.md` path.
|
||||
- Warning count and final status.
|
||||
- Math renderability failures or checker-unavailable count.
|
||||
- Table fallback or degradation count when available.
|
||||
@@ -110,7 +109,7 @@ Local fixture coverage should include these risk categories where samples are av
|
||||
- Figure, caption, or extracted asset links.
|
||||
- Korean or non-ASCII filename/path handling.
|
||||
|
||||
Observed local fixture map as of 2026-05-08:
|
||||
Observed local fixture map as of 2026-05-11:
|
||||
|
||||
| Local sample | Fixture risks covered | Notes |
|
||||
| --- | --- | --- |
|
||||
@@ -126,7 +125,7 @@ Coverage gaps to keep visible:
|
||||
- A table-dominant sample with known formula cells would make table degradation easier to judge.
|
||||
- A figure-heavy sample with expected extracted assets would make asset link validation easier to judge.
|
||||
|
||||
Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, metadata fields, warning usefulness, and `.report.md` usefulness.
|
||||
Do not score fixture quality only by plain-text edit distance. Include math delimiter/renderability behavior, tables, reading order, assets, report provenance, warning usefulness, and `_report.md` usefulness.
|
||||
|
||||
## No-Sample-Commit Check
|
||||
|
||||
|
||||
Reference in New Issue
Block a user