# CSV Schema/Tolerance Comparison Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add a no-Abaqus CSV comparison script that validates externally generated ODB-extracted actual CSV files against approved reference CSV artifacts by schema, row identity, units/coordinate metadata, and tolerance. **Architecture:** Keep `scripts/validate_reference_artifacts.py` responsible for artifact completeness only. Add `scripts/compare_extracted_csv.py` as an explicit CLI tool that reads `references///metadata.json`, validates the reference bundle, loads actual CSVs from a user-provided external result bundle, compares rows using declared schema/tolerance rules, and emits pass/fail plus optional JSON evidence. Do not integrate this into default `scripts/validate_workspace.py` because actual CSVs are generated outside this project and may not exist on every machine. **Tech Stack:** Python standard library only (`argparse`, `csv`, `json`, `math`, `dataclasses`, `pathlib`, `statistics` or direct RMS math), existing `unittest` test style, existing reference artifact metadata contract. --- ## File Structure - Create: `scripts/compare_extracted_csv.py` - CLI and importable functions for loading metadata, resolving CSV paths, validating CSV schema, matching rows, computing tolerance metrics, classifying failures, and emitting text/JSON reports. - Create: `scripts/test_compare_extracted_csv.py` - TDD coverage for pass, schema mismatch, missing actual output, ID mismatch, unit/coordinate mismatch, nonfinite values, and tolerance failure. - Modify: `docs/reference-verifications/README.md` - Document CLI usage, metadata comparison contract, failure classifications, and expected report fields. - Modify: `docs/reference-models/README.md` - Add optional `comparisons` metadata block to the artifact bundle example. - Do not modify: `scripts/validate_workspace.py` - CSV comparison needs explicit actual output paths, so it stays outside default workspace validation. ## Metadata Contract Add an optional `comparisons` block to `metadata.json`. The comparison script requires this block for quantities it compares, but `validate_reference_artifacts.py` does not need to require it for all `ready-for-comparison` bundles. ```json { "comparisons": { "stresses": { "reference_csv": "extracted/stresses.csv", "actual_csv": "extracted/stresses.csv", "required_columns": [ "step", "frame", "instance", "element_label", "integration_point", "section_point", "output_position", "component", "coordinate_system", "unit", "value" ], "key_columns": [ "step", "frame", "instance", "element_label", "integration_point", "section_point", "output_position", "component" ], "value_column": "value", "unit_column": "unit", "coordinate_system_column": "coordinate_system", "tolerance": { "absolute": 1.0e-8, "relative": 1.0e-6, "relative_floor": 1.0e-12 } } } } ``` Tolerance rule: ```text absolute_error = abs(actual - reference) relative_error = absolute_error / max(abs(reference), relative_floor) allowed_error = absolute + relative * max(abs(reference), relative_floor) row_pass = absolute_error <= allowed_error quantity_pass = all rows pass and no schema/id/unit/coordinate/nonfinite errors exist ``` ## CLI Contract Primary command: ```bash python scripts/compare_extracted_csv.py --metadata references///metadata.json --actual-root external-results// ``` Optional filters and report output: ```bash python scripts/compare_extracted_csv.py --metadata references/umat/single-element/metadata.json --actual-root external-results/umat/single-element --quantity stresses --report-json build/reference-verification/umat-single-element.json ``` Exit codes: - `0`: every requested quantity passed. - `1`: comparison completed and one or more quantities failed. - `2`: invalid CLI arguments, invalid metadata, missing files, or unreadable CSV. ## Failure Classification The script should produce one primary classification per failed quantity: - `missing-reference-artifact`: declared reference CSV is absent after metadata validation. - `missing-generated-output`: actual CSV under `--actual-root` is absent. - `schema-mismatch`: required columns are missing, duplicate headers exist, or duplicate key rows exist. - `id-mismatch`: missing or extra key rows exist. - `unit-or-coordinate-mismatch`: matched rows disagree on unit or coordinate system. - `nonfinite-result`: reference or actual `value` is NaN or infinite. - `tolerance-failure`: schema, IDs, unit, and coordinate checks pass, but numeric error exceeds tolerance. - `upstream-contract`: requested quantity has no `comparisons.` contract. - `environment`: file cannot be read due to encoding or OS errors. ## Report Contract Text output should be concise and machine-adjacent: ```text PASS stresses rows=8 max_abs_error=1.2e-10 max_rel_error=3.0e-9 rms_error=8.1e-11 worst_key=Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11 ``` Failed quantity example: ```text FAIL stresses classification=tolerance-failure rows=8 max_abs_error=2.4e-4 max_rel_error=1.2e-2 rms_error=8.5e-5 worst_key=Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11 ``` JSON report should contain: ```json { "metadata": "references/umat/single-element/metadata.json", "actual_root": "external-results/umat/single-element", "overall_result": "pass", "quantities": [ { "quantity": "stresses", "result": "pass", "classification": "N/A", "compared_rows": 8, "missing_rows": 0, "extra_rows": 0, "max_abs_error": 1.2e-10, "max_rel_error": 3.0e-9, "rms_error": 8.1e-11, "worst_key": "Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11", "worst_component": "S11" } ] } ``` --- ### Task 1: Write Pass-Case Test Fixture **Files:** - Create: `scripts/test_compare_extracted_csv.py` - [ ] **Step 1: Write dynamic import and fixture helpers** ```python import csv import importlib.util import json import tempfile import unittest from pathlib import Path def load_compare_extracted_csv(): module_path = Path(__file__).resolve().parent / "compare_extracted_csv.py" spec = importlib.util.spec_from_file_location("compare_extracted_csv", module_path) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module def write_json(path: Path, payload: dict): path.parent.mkdir(parents=True, exist_ok=True) path.write_text(json.dumps(payload, indent=2), encoding="utf-8") def write_csv(path: Path, rows: list[dict[str, str]]): path.parent.mkdir(parents=True, exist_ok=True) with path.open("w", newline="", encoding="utf-8") as handle: writer = csv.DictWriter(handle, fieldnames=list(rows[0])) writer.writeheader() writer.writerows(rows) def metadata_payload() -> dict: return { "schema_version": "abaqus-user-subroutine-artifact-v1", "feature_id": "umat", "model_id": "single-element", "artifact_status": "ready-for-comparison", "abaqus": {"version": "2024", "precision": "double"}, "compiler": {"vendor": "Intel oneAPI", "name": "ifx", "version": "2024"}, "subroutine": {"entry_points": ["UMAT"], "source_files": []}, "input_file": "model.inp", "outputs": { "tails": { "msg": "job.msg.tail.txt", "dat": "job.dat.tail.txt", "log": "job.log.tail.txt", "sta": "job.sta.tail.txt" }, "csv": {"stresses": "extracted/stresses.csv"} }, "extraction": { "source_odb": "job.odb", "tool": "Abaqus Python", "extracted_at": "2026-06-10T00:00:00+09:00", "csv_directory": "extracted" }, "comparisons": { "stresses": { "reference_csv": "extracted/stresses.csv", "actual_csv": "extracted/stresses.csv", "required_columns": [ "step", "frame", "instance", "element_label", "integration_point", "section_point", "output_position", "component", "coordinate_system", "unit", "value" ], "key_columns": [ "step", "frame", "instance", "element_label", "integration_point", "section_point", "output_position", "component" ], "value_column": "value", "unit_column": "unit", "coordinate_system_column": "coordinate_system", "tolerance": {"absolute": 1.0e-8, "relative": 1.0e-6, "relative_floor": 1.0e-12} } } } def stress_rows(value: str = "100.0") -> list[dict[str, str]]: return [ { "step": "Step-1", "frame": "1", "instance": "PART-1-1", "element_label": "1", "integration_point": "1", "section_point": "", "output_position": "INTEGRATION_POINT", "component": "S11", "coordinate_system": "GLOBAL", "unit": "MPa", "value": value } ] ``` - [ ] **Step 2: Write passing comparison test** ```python class CompareExtractedCsvTests(unittest.TestCase): def test_quantity_passes_when_schema_keys_units_and_values_match_within_tolerance(self): compare = load_compare_extracted_csv() with tempfile.TemporaryDirectory() as tmp: root = Path(tmp) reference = root / "references" / "umat" / "single-element" actual = root / "external-results" / "umat" / "single-element" write_json(reference / "metadata.json", metadata_payload()) write_csv(reference / "extracted" / "stresses.csv", stress_rows("100.0")) write_csv(actual / "extracted" / "stresses.csv", stress_rows("100.00000001")) report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False) self.assertEqual(report["overall_result"], "pass") self.assertEqual(report["quantities"][0]["result"], "pass") self.assertEqual(report["quantities"][0]["classification"], "N/A") self.assertEqual(report["quantities"][0]["compared_rows"], 1) ``` - [ ] **Step 3: Run test to verify RED** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: FAIL because `scripts/compare_extracted_csv.py` does not exist. ### Task 2: Implement Minimal Pass-Case Comparison **Files:** - Create: `scripts/compare_extracted_csv.py` - [ ] **Step 1: Add importable API skeleton and minimal comparison** Implement these functions: ```python def compare_metadata(metadata_path: Path, actual_root: Path, *, quantities: list[str] | None = None, validate_artifacts: bool = True) -> dict: ... def load_csv_rows(path: Path) -> tuple[list[str], list[dict[str, str]]]: ... def compare_quantity(quantity: str, contract: dict, reference_root: Path, actual_root: Path) -> dict: ... ``` Minimum behavior for GREEN: - Load metadata JSON. - Resolve `comparisons..reference_csv` under `metadata_path.parent`. - Resolve `comparisons..actual_csv` under `actual_root`. - Load both CSV files with `csv.DictReader`. - Check required columns are present. - Match rows by `key_columns`. - Parse `value_column` as finite float. - Compute `max_abs_error`, `max_rel_error`, `rms_error`, `worst_key`. - Return `overall_result=pass` if no errors exceed tolerance. - [ ] **Step 2: Run pass-case test** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: PASS. ### Task 3: Add Schema and Contract Failure Tests **Files:** - Modify: `scripts/test_compare_extracted_csv.py` - Modify: `scripts/compare_extracted_csv.py` - [ ] **Step 1: Add missing actual output test** ```python def test_missing_actual_csv_is_missing_generated_output(self): compare = load_compare_extracted_csv() with tempfile.TemporaryDirectory() as tmp: root = Path(tmp) reference = root / "references" / "umat" / "single-element" actual = root / "external-results" / "umat" / "single-element" write_json(reference / "metadata.json", metadata_payload()) write_csv(reference / "extracted" / "stresses.csv", stress_rows("100.0")) report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False) self.assertEqual(report["overall_result"], "fail") self.assertEqual(report["quantities"][0]["classification"], "missing-generated-output") ``` - [ ] **Step 2: Add missing required column test** ```python def test_missing_required_column_is_schema_mismatch(self): compare = load_compare_extracted_csv() with tempfile.TemporaryDirectory() as tmp: root = Path(tmp) reference = root / "references" / "umat" / "single-element" actual = root / "external-results" / "umat" / "single-element" write_json(reference / "metadata.json", metadata_payload()) row = stress_rows("100.0")[0] write_csv(reference / "extracted" / "stresses.csv", [row]) actual_row = dict(row) actual_row.pop("coordinate_system") write_csv(actual / "extracted" / "stresses.csv", [actual_row]) report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False) self.assertEqual(report["quantities"][0]["classification"], "schema-mismatch") ``` - [ ] **Step 3: Add missing comparison contract test** ```python def test_missing_quantity_contract_is_upstream_contract(self): compare = load_compare_extracted_csv() with tempfile.TemporaryDirectory() as tmp: root = Path(tmp) reference = root / "references" / "umat" / "single-element" actual = root / "external-results" / "umat" / "single-element" payload = metadata_payload() payload["comparisons"].pop("stresses") write_json(reference / "metadata.json", payload) report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False) self.assertEqual(report["quantities"][0]["classification"], "upstream-contract") ``` - [ ] **Step 4: Run tests to verify RED** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: FAIL on the new failure classifications. - [ ] **Step 5: Implement missing file, schema, and contract classification** Add helper functions: ```python def failed_quantity(quantity: str, classification: str, message: str) -> dict: ... def validate_columns(headers: list[str], required_columns: list[str]) -> list[str]: ... ``` Return stable fields even on failure: ```python { "quantity": quantity, "result": "fail", "classification": classification, "message": message, "compared_rows": 0, "missing_rows": 0, "extra_rows": 0, "max_abs_error": None, "max_rel_error": None, "rms_error": None, "worst_key": None, "worst_component": None } ``` - [ ] **Step 6: Run tests to verify GREEN** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: PASS. ### Task 4: Add Row Matching, Unit, Coordinate, Nonfinite, and Tolerance Tests **Files:** - Modify: `scripts/test_compare_extracted_csv.py` - Modify: `scripts/compare_extracted_csv.py` - [ ] **Step 1: Add ID mismatch test** Change actual `element_label` from `1` to `2`. Expected classification: `id-mismatch`, with `missing_rows=1` and `extra_rows=1`. - [ ] **Step 2: Add unit mismatch test** Change actual `unit` from `MPa` to `Pa`. Expected classification: `unit-or-coordinate-mismatch`. - [ ] **Step 3: Add coordinate mismatch test** Change actual `coordinate_system` from `GLOBAL` to `LOCAL-1`. Expected classification: `unit-or-coordinate-mismatch`. - [ ] **Step 4: Add nonfinite test** Set actual `value` to `nan`. Expected classification: `nonfinite-result`. - [ ] **Step 5: Add tolerance failure test** Set actual `value` to `101.0` for reference `100.0`. Expected classification: `tolerance-failure`, `max_abs_error=1.0`, and `result=fail`. - [ ] **Step 6: Run tests to verify RED** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: FAIL until these classifications are implemented. - [ ] **Step 7: Implement row matching and classification precedence** Use this precedence: ```text missing-reference-artifact missing-generated-output upstream-contract schema-mismatch id-mismatch nonfinite-result unit-or-coordinate-mismatch tolerance-failure N/A ``` Implement row keys as: ```python def make_key(row: dict[str, str], key_columns: list[str]) -> tuple[str, ...]: return tuple(row.get(column, "") for column in key_columns) ``` Detect duplicate keys in either CSV as `schema-mismatch`. - [ ] **Step 8: Run tests to verify GREEN** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: PASS. ### Task 5: Add CLI and JSON Report Tests **Files:** - Modify: `scripts/test_compare_extracted_csv.py` - Modify: `scripts/compare_extracted_csv.py` - [ ] **Step 1: Add CLI pass test using `main(argv)`** Test: ```python exit_code = compare.main([ "--metadata", str(reference / "metadata.json"), "--actual-root", str(actual), "--quantity", "stresses", "--report-json", str(report_json) ]) self.assertEqual(exit_code, 0) self.assertEqual(json.loads(report_json.read_text(encoding="utf-8"))["overall_result"], "pass") ``` - [ ] **Step 2: Add CLI fail test** Use a tolerance failure fixture. Expected `main(...) == 1` and JSON `overall_result == "fail"`. - [ ] **Step 3: Add CLI invalid argument test** Call without `--metadata` or `--actual-root`. Expected `main(...) == 2`. - [ ] **Step 4: Run tests to verify RED** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: FAIL until CLI exists. - [ ] **Step 5: Implement CLI** Implement: ```python def build_arg_parser() -> argparse.ArgumentParser: ... def main(argv: list[str] | None = None) -> int: ... ``` CLI behavior: - `--quantity` may be repeated. - If no `--quantity` is supplied, compare all keys under `metadata["comparisons"]`. - `--report-json` creates parent directories and writes UTF-8 JSON. - Print one summary line per quantity. - Return `0`, `1`, or `2` according to the CLI contract. - [ ] **Step 6: Run tests to verify GREEN** Run: ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: PASS. ### Task 6: Optional Artifact Validator Integration **Files:** - Modify: `scripts/compare_extracted_csv.py` - Modify: `scripts/test_compare_extracted_csv.py` - [ ] **Step 1: Add test that default comparison calls metadata validation** Use a metadata file missing required ready-for-comparison fields and call `compare_metadata(..., validate_artifacts=True)`. Expected classification: `missing-reference-artifact` or exit code `2` with validation errors. - [ ] **Step 2: Implement reuse of `validate_reference_artifacts.validate_metadata`** Import safely: ```python try: from validate_reference_artifacts import validate_metadata except ImportError: from scripts.validate_reference_artifacts import validate_metadata ``` Run validation before comparison when `validate_artifacts=True`. - [ ] **Step 3: Keep tests able to bypass validation** Continue supporting `validate_artifacts=False` in unit tests that only exercise comparison logic. ### Task 7: Documentation Updates **Files:** - Modify: `docs/reference-verifications/README.md` - Modify: `docs/reference-models/README.md` - [ ] **Step 1: Update reference verification README** Add a section: ````markdown ## CSV Comparison Command Run explicit external-result comparison with: ```bash python scripts/compare_extracted_csv.py --metadata references///metadata.json --actual-root external-results// ``` The command does not run Abaqus and does not parse ODB files. It compares approved `references/.../extracted/*.csv` files with externally generated actual CSV files under `--actual-root`. ```` - [ ] **Step 2: Update reference model README metadata example** Add the `comparisons` JSON block shown in this plan. - [ ] **Step 3: Run documentation-sensitive search** Run: ```bash rg -n "compare_extracted_csv|comparisons|extracted/.*\\.csv" docs scripts ``` Expected: The new script, tests, and docs mention the comparison contract. ### Task 8: Full Verification **Files:** - No new edits unless failures reveal a bug in this task's changes. - [ ] **Step 1: Run targeted tests** ```bash python -m unittest scripts.test_compare_extracted_csv ``` Expected: PASS. - [ ] **Step 2: Run full script tests** ```bash python -m unittest discover -s scripts -p "test_*.py" ``` Expected: PASS. - [ ] **Step 3: Run reference artifact validation** ```bash python scripts/validate_reference_artifacts.py ``` Expected: `Reference artifact metadata validation succeeded.` - [ ] **Step 4: Run Fortran validation** ```bash python scripts/validate_fortran.py ``` Expected: PASS, or `No Fortran validation commands configured.` when no manifest exists. - [ ] **Step 5: Run workspace validation** ```bash python scripts/validate_workspace.py ``` Expected: PASS. It should not require actual CSV outputs because `compare_extracted_csv.py` is explicit-use only. ## Acceptance Criteria - The project never runs Abaqus and never opens ODB files during CSV comparison. - Reference bundle completeness remains checked by `scripts/validate_reference_artifacts.py`. - CSV numeric validation is performed by explicit command only. - Actual generated CSVs are read from a user-supplied `--actual-root`. - Comparison requires declared schema, key columns, value column, unit/coordinate columns, and absolute/relative tolerance. - Missing files, schema mismatch, ID mismatch, unit/coordinate mismatch, nonfinite results, and tolerance failures have distinct classifications. - JSON report includes enough metrics for Reference Verification Agent handoff: compared rows, missing rows, extra rows, max absolute error, max relative error, RMS error, worst key, and pass/fail. ## Open Decisions - Whether actual result bundles should live under a conventional ignored path such as `external-results/` or `runs/`. The script should accept any `--actual-root`, so this can remain a documentation convention. - Whether comparison metadata should later move from `metadata.json` into feature-specific I/O definition documents. For the first implementation, keep executable comparison rules in `metadata.json` so the script has one deterministic contract source.