modify documents

2026-06-11 11:08:27 +09:00
parent 98eba54a12
commit 986cc9888e
35 changed files with 1984 additions and 169 deletions
@@ -0,0 +1,706 @@
+# CSV Schema/Tolerance Comparison Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add a no-Abaqus CSV comparison script that validates externally generated ODB-extracted actual CSV files against approved reference CSV artifacts by schema, row identity, units/coordinate metadata, and tolerance.
+
+**Architecture:** Keep `scripts/validate_reference_artifacts.py` responsible for artifact completeness only. Add `scripts/compare_extracted_csv.py` as an explicit CLI tool that reads `references/<feature-id>/<model-id>/metadata.json`, validates the reference bundle, loads actual CSVs from a user-provided external result bundle, compares rows using declared schema/tolerance rules, and emits pass/fail plus optional JSON evidence. Do not integrate this into default `scripts/validate_workspace.py` because actual CSVs are generated outside this project and may not exist on every machine.
+
+**Tech Stack:** Python standard library only (`argparse`, `csv`, `json`, `math`, `dataclasses`, `pathlib`, `statistics` or direct RMS math), existing `unittest` test style, existing reference artifact metadata contract.
+
+---
+
+## File Structure
+
+- Create: `scripts/compare_extracted_csv.py`
+  - CLI and importable functions for loading metadata, resolving CSV paths, validating CSV schema, matching rows, computing tolerance metrics, classifying failures, and emitting text/JSON reports.
+- Create: `scripts/test_compare_extracted_csv.py`
+  - TDD coverage for pass, schema mismatch, missing actual output, ID mismatch, unit/coordinate mismatch, nonfinite values, and tolerance failure.
+- Modify: `docs/reference-verifications/README.md`
+  - Document CLI usage, metadata comparison contract, failure classifications, and expected report fields.
+- Modify: `docs/reference-models/README.md`
+  - Add optional `comparisons` metadata block to the artifact bundle example.
+- Do not modify: `scripts/validate_workspace.py`
+  - CSV comparison needs explicit actual output paths, so it stays outside default workspace validation.
+
+## Metadata Contract
+
+Add an optional `comparisons` block to `metadata.json`. The comparison script requires this block for quantities it compares, but `validate_reference_artifacts.py` does not need to require it for all `ready-for-comparison` bundles.
+
+```json
+{
+  "comparisons": {
+    "stresses": {
+      "reference_csv": "extracted/stresses.csv",
+      "actual_csv": "extracted/stresses.csv",
+      "required_columns": [
+        "step",
+        "frame",
+        "instance",
+        "element_label",
+        "integration_point",
+        "section_point",
+        "output_position",
+        "component",
+        "coordinate_system",
+        "unit",
+        "value"
+      ],
+      "key_columns": [
+        "step",
+        "frame",
+        "instance",
+        "element_label",
+        "integration_point",
+        "section_point",
+        "output_position",
+        "component"
+      ],
+      "value_column": "value",
+      "unit_column": "unit",
+      "coordinate_system_column": "coordinate_system",
+      "tolerance": {
+        "absolute": 1.0e-8,
+        "relative": 1.0e-6,
+        "relative_floor": 1.0e-12
+      }
+    }
+  }
+}
+```
+
+Tolerance rule:
+
+```text
+absolute_error = abs(actual - reference)
+relative_error = absolute_error / max(abs(reference), relative_floor)
+allowed_error = absolute + relative * max(abs(reference), relative_floor)
+row_pass = absolute_error <= allowed_error
+quantity_pass = all rows pass and no schema/id/unit/coordinate/nonfinite errors exist
+```
+
+## CLI Contract
+
+Primary command:
+
+```bash
+python scripts/compare_extracted_csv.py --metadata references/<feature-id>/<model-id>/metadata.json --actual-root external-results/<feature-id>/<model-id>
+```
+
+Optional filters and report output:
+
+```bash
+python scripts/compare_extracted_csv.py --metadata references/umat/single-element/metadata.json --actual-root external-results/umat/single-element --quantity stresses --report-json build/reference-verification/umat-single-element.json
+```
+
+Exit codes:
+
+- `0`: every requested quantity passed.
+- `1`: comparison completed and one or more quantities failed.
+- `2`: invalid CLI arguments, invalid metadata, missing files, or unreadable CSV.
+
+## Failure Classification
+
+The script should produce one primary classification per failed quantity:
+
+- `missing-reference-artifact`: declared reference CSV is absent after metadata validation.
+- `missing-generated-output`: actual CSV under `--actual-root` is absent.
+- `schema-mismatch`: required columns are missing, duplicate headers exist, or duplicate key rows exist.
+- `id-mismatch`: missing or extra key rows exist.
+- `unit-or-coordinate-mismatch`: matched rows disagree on unit or coordinate system.
+- `nonfinite-result`: reference or actual `value` is NaN or infinite.
+- `tolerance-failure`: schema, IDs, unit, and coordinate checks pass, but numeric error exceeds tolerance.
+- `upstream-contract`: requested quantity has no `comparisons.<quantity>` contract.
+- `environment`: file cannot be read due to encoding or OS errors.
+
+## Report Contract
+
+Text output should be concise and machine-adjacent:
+
+```text
+PASS stresses rows=8 max_abs_error=1.2e-10 max_rel_error=3.0e-9 rms_error=8.1e-11 worst_key=Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11
+```
+
+Failed quantity example:
+
+```text
+FAIL stresses classification=tolerance-failure rows=8 max_abs_error=2.4e-4 max_rel_error=1.2e-2 rms_error=8.5e-5 worst_key=Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11
+```
+
+JSON report should contain:
+
+```json
+{
+  "metadata": "references/umat/single-element/metadata.json",
+  "actual_root": "external-results/umat/single-element",
+  "overall_result": "pass",
+  "quantities": [
+    {
+      "quantity": "stresses",
+      "result": "pass",
+      "classification": "N/A",
+      "compared_rows": 8,
+      "missing_rows": 0,
+      "extra_rows": 0,
+      "max_abs_error": 1.2e-10,
+      "max_rel_error": 3.0e-9,
+      "rms_error": 8.1e-11,
+      "worst_key": "Step-1|1|PART-1-1|1|1||INTEGRATION_POINT|S11",
+      "worst_component": "S11"
+    }
+  ]
+}
+```
+
+---
+
+### Task 1: Write Pass-Case Test Fixture
+
+**Files:**
+- Create: `scripts/test_compare_extracted_csv.py`
+
+- [ ] **Step 1: Write dynamic import and fixture helpers**
+
+```python
+import csv
+import importlib.util
+import json
+import tempfile
+import unittest
+from pathlib import Path
+
+
+def load_compare_extracted_csv():
+    module_path = Path(__file__).resolve().parent / "compare_extracted_csv.py"
+    spec = importlib.util.spec_from_file_location("compare_extracted_csv", module_path)
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    return module
+
+
+def write_json(path: Path, payload: dict):
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
+
+
+def write_csv(path: Path, rows: list[dict[str, str]]):
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("w", newline="", encoding="utf-8") as handle:
+        writer = csv.DictWriter(handle, fieldnames=list(rows[0]))
+        writer.writeheader()
+        writer.writerows(rows)
+
+
+def metadata_payload() -> dict:
+    return {
+        "schema_version": "abaqus-user-subroutine-artifact-v1",
+        "feature_id": "umat",
+        "model_id": "single-element",
+        "artifact_status": "ready-for-comparison",
+        "abaqus": {"version": "2024", "precision": "double"},
+        "compiler": {"vendor": "Intel oneAPI", "name": "ifx", "version": "2024"},
+        "subroutine": {"entry_points": ["UMAT"], "source_files": []},
+        "input_file": "model.inp",
+        "outputs": {
+            "tails": {
+                "msg": "job.msg.tail.txt",
+                "dat": "job.dat.tail.txt",
+                "log": "job.log.tail.txt",
+                "sta": "job.sta.tail.txt"
+            },
+            "csv": {"stresses": "extracted/stresses.csv"}
+        },
+        "extraction": {
+            "source_odb": "job.odb",
+            "tool": "Abaqus Python",
+            "extracted_at": "2026-06-10T00:00:00+09:00",
+            "csv_directory": "extracted"
+        },
+        "comparisons": {
+            "stresses": {
+                "reference_csv": "extracted/stresses.csv",
+                "actual_csv": "extracted/stresses.csv",
+                "required_columns": [
+                    "step", "frame", "instance", "element_label", "integration_point",
+                    "section_point", "output_position", "component",
+                    "coordinate_system", "unit", "value"
+                ],
+                "key_columns": [
+                    "step", "frame", "instance", "element_label", "integration_point",
+                    "section_point", "output_position", "component"
+                ],
+                "value_column": "value",
+                "unit_column": "unit",
+                "coordinate_system_column": "coordinate_system",
+                "tolerance": {"absolute": 1.0e-8, "relative": 1.0e-6, "relative_floor": 1.0e-12}
+            }
+        }
+    }
+
+
+def stress_rows(value: str = "100.0") -> list[dict[str, str]]:
+    return [
+        {
+            "step": "Step-1",
+            "frame": "1",
+            "instance": "PART-1-1",
+            "element_label": "1",
+            "integration_point": "1",
+            "section_point": "",
+            "output_position": "INTEGRATION_POINT",
+            "component": "S11",
+            "coordinate_system": "GLOBAL",
+            "unit": "MPa",
+            "value": value
+        }
+    ]
+```
+
+- [ ] **Step 2: Write passing comparison test**
+
+```python
+class CompareExtractedCsvTests(unittest.TestCase):
+    def test_quantity_passes_when_schema_keys_units_and_values_match_within_tolerance(self):
+        compare = load_compare_extracted_csv()
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            reference = root / "references" / "umat" / "single-element"
+            actual = root / "external-results" / "umat" / "single-element"
+            write_json(reference / "metadata.json", metadata_payload())
+            write_csv(reference / "extracted" / "stresses.csv", stress_rows("100.0"))
+            write_csv(actual / "extracted" / "stresses.csv", stress_rows("100.00000001"))
+
+            report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False)
+
+        self.assertEqual(report["overall_result"], "pass")
+        self.assertEqual(report["quantities"][0]["result"], "pass")
+        self.assertEqual(report["quantities"][0]["classification"], "N/A")
+        self.assertEqual(report["quantities"][0]["compared_rows"], 1)
+```
+
+- [ ] **Step 3: Run test to verify RED**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: FAIL because `scripts/compare_extracted_csv.py` does not exist.
+
+### Task 2: Implement Minimal Pass-Case Comparison
+
+**Files:**
+- Create: `scripts/compare_extracted_csv.py`
+
+- [ ] **Step 1: Add importable API skeleton and minimal comparison**
+
+Implement these functions:
+
+```python
+def compare_metadata(metadata_path: Path, actual_root: Path, *, quantities: list[str] | None = None, validate_artifacts: bool = True) -> dict:
+    ...
+
+def load_csv_rows(path: Path) -> tuple[list[str], list[dict[str, str]]]:
+    ...
+
+def compare_quantity(quantity: str, contract: dict, reference_root: Path, actual_root: Path) -> dict:
+    ...
+```
+
+Minimum behavior for GREEN:
+- Load metadata JSON.
+- Resolve `comparisons.<quantity>.reference_csv` under `metadata_path.parent`.
+- Resolve `comparisons.<quantity>.actual_csv` under `actual_root`.
+- Load both CSV files with `csv.DictReader`.
+- Check required columns are present.
+- Match rows by `key_columns`.
+- Parse `value_column` as finite float.
+- Compute `max_abs_error`, `max_rel_error`, `rms_error`, `worst_key`.
+- Return `overall_result=pass` if no errors exceed tolerance.
+
+- [ ] **Step 2: Run pass-case test**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: PASS.
+
+### Task 3: Add Schema and Contract Failure Tests
+
+**Files:**
+- Modify: `scripts/test_compare_extracted_csv.py`
+- Modify: `scripts/compare_extracted_csv.py`
+
+- [ ] **Step 1: Add missing actual output test**
+
+```python
+def test_missing_actual_csv_is_missing_generated_output(self):
+    compare = load_compare_extracted_csv()
+    with tempfile.TemporaryDirectory() as tmp:
+        root = Path(tmp)
+        reference = root / "references" / "umat" / "single-element"
+        actual = root / "external-results" / "umat" / "single-element"
+        write_json(reference / "metadata.json", metadata_payload())
+        write_csv(reference / "extracted" / "stresses.csv", stress_rows("100.0"))
+
+        report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False)
+
+    self.assertEqual(report["overall_result"], "fail")
+    self.assertEqual(report["quantities"][0]["classification"], "missing-generated-output")
+```
+
+- [ ] **Step 2: Add missing required column test**
+
+```python
+def test_missing_required_column_is_schema_mismatch(self):
+    compare = load_compare_extracted_csv()
+    with tempfile.TemporaryDirectory() as tmp:
+        root = Path(tmp)
+        reference = root / "references" / "umat" / "single-element"
+        actual = root / "external-results" / "umat" / "single-element"
+        write_json(reference / "metadata.json", metadata_payload())
+        row = stress_rows("100.0")[0]
+        write_csv(reference / "extracted" / "stresses.csv", [row])
+        actual_row = dict(row)
+        actual_row.pop("coordinate_system")
+        write_csv(actual / "extracted" / "stresses.csv", [actual_row])
+
+        report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False)
+
+    self.assertEqual(report["quantities"][0]["classification"], "schema-mismatch")
+```
+
+- [ ] **Step 3: Add missing comparison contract test**
+
+```python
+def test_missing_quantity_contract_is_upstream_contract(self):
+    compare = load_compare_extracted_csv()
+    with tempfile.TemporaryDirectory() as tmp:
+        root = Path(tmp)
+        reference = root / "references" / "umat" / "single-element"
+        actual = root / "external-results" / "umat" / "single-element"
+        payload = metadata_payload()
+        payload["comparisons"].pop("stresses")
+        write_json(reference / "metadata.json", payload)
+
+        report = compare.compare_metadata(reference / "metadata.json", actual, quantities=["stresses"], validate_artifacts=False)
+
+    self.assertEqual(report["quantities"][0]["classification"], "upstream-contract")
+```
+
+- [ ] **Step 4: Run tests to verify RED**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: FAIL on the new failure classifications.
+
+- [ ] **Step 5: Implement missing file, schema, and contract classification**
+
+Add helper functions:
+
+```python
+def failed_quantity(quantity: str, classification: str, message: str) -> dict:
+    ...
+
+def validate_columns(headers: list[str], required_columns: list[str]) -> list[str]:
+    ...
+```
+
+Return stable fields even on failure:
+
+```python
+{
+    "quantity": quantity,
+    "result": "fail",
+    "classification": classification,
+    "message": message,
+    "compared_rows": 0,
+    "missing_rows": 0,
+    "extra_rows": 0,
+    "max_abs_error": None,
+    "max_rel_error": None,
+    "rms_error": None,
+    "worst_key": None,
+    "worst_component": None
+}
+```
+
+- [ ] **Step 6: Run tests to verify GREEN**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: PASS.
+
+### Task 4: Add Row Matching, Unit, Coordinate, Nonfinite, and Tolerance Tests
+
+**Files:**
+- Modify: `scripts/test_compare_extracted_csv.py`
+- Modify: `scripts/compare_extracted_csv.py`
+
+- [ ] **Step 1: Add ID mismatch test**
+
+Change actual `element_label` from `1` to `2`. Expected classification: `id-mismatch`, with `missing_rows=1` and `extra_rows=1`.
+
+- [ ] **Step 2: Add unit mismatch test**
+
+Change actual `unit` from `MPa` to `Pa`. Expected classification: `unit-or-coordinate-mismatch`.
+
+- [ ] **Step 3: Add coordinate mismatch test**
+
+Change actual `coordinate_system` from `GLOBAL` to `LOCAL-1`. Expected classification: `unit-or-coordinate-mismatch`.
+
+- [ ] **Step 4: Add nonfinite test**
+
+Set actual `value` to `nan`. Expected classification: `nonfinite-result`.
+
+- [ ] **Step 5: Add tolerance failure test**
+
+Set actual `value` to `101.0` for reference `100.0`. Expected classification: `tolerance-failure`, `max_abs_error=1.0`, and `result=fail`.
+
+- [ ] **Step 6: Run tests to verify RED**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: FAIL until these classifications are implemented.
+
+- [ ] **Step 7: Implement row matching and classification precedence**
+
+Use this precedence:
+
+```text
+missing-reference-artifact
+missing-generated-output
+upstream-contract
+schema-mismatch
+id-mismatch
+nonfinite-result
+unit-or-coordinate-mismatch
+tolerance-failure
+N/A
+```
+
+Implement row keys as:
+
+```python
+def make_key(row: dict[str, str], key_columns: list[str]) -> tuple[str, ...]:
+    return tuple(row.get(column, "") for column in key_columns)
+```
+
+Detect duplicate keys in either CSV as `schema-mismatch`.
+
+- [ ] **Step 8: Run tests to verify GREEN**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: PASS.
+
+### Task 5: Add CLI and JSON Report Tests
+
+**Files:**
+- Modify: `scripts/test_compare_extracted_csv.py`
+- Modify: `scripts/compare_extracted_csv.py`
+
+- [ ] **Step 1: Add CLI pass test using `main(argv)`**
+
+Test:
+
+```python
+exit_code = compare.main([
+    "--metadata", str(reference / "metadata.json"),
+    "--actual-root", str(actual),
+    "--quantity", "stresses",
+    "--report-json", str(report_json)
+])
+self.assertEqual(exit_code, 0)
+self.assertEqual(json.loads(report_json.read_text(encoding="utf-8"))["overall_result"], "pass")
+```
+
+- [ ] **Step 2: Add CLI fail test**
+
+Use a tolerance failure fixture. Expected `main(...) == 1` and JSON `overall_result == "fail"`.
+
+- [ ] **Step 3: Add CLI invalid argument test**
+
+Call without `--metadata` or `--actual-root`. Expected `main(...) == 2`.
+
+- [ ] **Step 4: Run tests to verify RED**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: FAIL until CLI exists.
+
+- [ ] **Step 5: Implement CLI**
+
+Implement:
+
+```python
+def build_arg_parser() -> argparse.ArgumentParser:
+    ...
+
+def main(argv: list[str] | None = None) -> int:
+    ...
+```
+
+CLI behavior:
+- `--quantity` may be repeated.
+- If no `--quantity` is supplied, compare all keys under `metadata["comparisons"]`.
+- `--report-json` creates parent directories and writes UTF-8 JSON.
+- Print one summary line per quantity.
+- Return `0`, `1`, or `2` according to the CLI contract.
+
+- [ ] **Step 6: Run tests to verify GREEN**
+
+Run:
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: PASS.
+
+### Task 6: Optional Artifact Validator Integration
+
+**Files:**
+- Modify: `scripts/compare_extracted_csv.py`
+- Modify: `scripts/test_compare_extracted_csv.py`
+
+- [ ] **Step 1: Add test that default comparison calls metadata validation**
+
+Use a metadata file missing required ready-for-comparison fields and call `compare_metadata(..., validate_artifacts=True)`. Expected classification: `missing-reference-artifact` or exit code `2` with validation errors.
+
+- [ ] **Step 2: Implement reuse of `validate_reference_artifacts.validate_metadata`**
+
+Import safely:
+
+```python
+try:
+    from validate_reference_artifacts import validate_metadata
+except ImportError:
+    from scripts.validate_reference_artifacts import validate_metadata
+```
+
+Run validation before comparison when `validate_artifacts=True`.
+
+- [ ] **Step 3: Keep tests able to bypass validation**
+
+Continue supporting `validate_artifacts=False` in unit tests that only exercise comparison logic.
+
+### Task 7: Documentation Updates
+
+**Files:**
+- Modify: `docs/reference-verifications/README.md`
+- Modify: `docs/reference-models/README.md`
+
+- [ ] **Step 1: Update reference verification README**
+
+Add a section:
+
+````markdown
+## CSV Comparison Command
+
+Run explicit external-result comparison with:
+
+```bash
+python scripts/compare_extracted_csv.py --metadata references/<feature-id>/<model-id>/metadata.json --actual-root external-results/<feature-id>/<model-id>
+```
+
+The command does not run Abaqus and does not parse ODB files. It compares approved `references/.../extracted/*.csv` files with externally generated actual CSV files under `--actual-root`.
+````
+
+- [ ] **Step 2: Update reference model README metadata example**
+
+Add the `comparisons` JSON block shown in this plan.
+
+- [ ] **Step 3: Run documentation-sensitive search**
+
+Run:
+
+```bash
+rg -n "compare_extracted_csv|comparisons|extracted/.*\\.csv" docs scripts
+```
+
+Expected: The new script, tests, and docs mention the comparison contract.
+
+### Task 8: Full Verification
+
+**Files:**
+- No new edits unless failures reveal a bug in this task's changes.
+
+- [ ] **Step 1: Run targeted tests**
+
+```bash
+python -m unittest scripts.test_compare_extracted_csv
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run full script tests**
+
+```bash
+python -m unittest discover -s scripts -p "test_*.py"
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run reference artifact validation**
+
+```bash
+python scripts/validate_reference_artifacts.py
+```
+
+Expected: `Reference artifact metadata validation succeeded.`
+
+- [ ] **Step 4: Run Fortran validation**
+
+```bash
+python scripts/validate_fortran.py
+```
+
+Expected: PASS, or `No Fortran validation commands configured.` when no manifest exists.
+
+- [ ] **Step 5: Run workspace validation**
+
+```bash
+python scripts/validate_workspace.py
+```
+
+Expected: PASS. It should not require actual CSV outputs because `compare_extracted_csv.py` is explicit-use only.
+
+## Acceptance Criteria
+
+- The project never runs Abaqus and never opens ODB files during CSV comparison.
+- Reference bundle completeness remains checked by `scripts/validate_reference_artifacts.py`.
+- CSV numeric validation is performed by explicit command only.
+- Actual generated CSVs are read from a user-supplied `--actual-root`.
+- Comparison requires declared schema, key columns, value column, unit/coordinate columns, and absolute/relative tolerance.
+- Missing files, schema mismatch, ID mismatch, unit/coordinate mismatch, nonfinite results, and tolerance failures have distinct classifications.
+- JSON report includes enough metrics for Reference Verification Agent handoff: compared rows, missing rows, extra rows, max absolute error, max relative error, RMS error, worst key, and pass/fail.
+
+## Open Decisions
+
+- Whether actual result bundles should live under a conventional ignored path such as `external-results/` or `runs/`. The script should accept any `--actual-root`, so this can remain a documentation convention.
+- Whether comparison metadata should later move from `metadata.json` into feature-specific I/O definition documents. For the first implementation, keep executable comparison rules in `metadata.json` so the script has one deterministic contract source.