# Harness Engineering ## Purpose This document defines how FESA uses long-running agent harnesses for planning, implementation, and evaluation. The goal is not to maximize agent count. The goal is to keep long solver work coherent, testable, and reference-verified across context resets and independent sessions. ## Default Harness Shape Use the smallest harness that can safely handle the task. For meaningful solver implementation or phase execution, use: ```text Planner -> Generator -> Evaluator ``` Roles: - `Planner`: turns project docs and `PLAN.md` tasks into a testable sprint contract or phase step. - `Generator`: implements exactly one accepted contract using TDD. - `Evaluator`: independently checks the result against the contract, docs, tests, reference artifacts, and validation commands. Do not use multi-agent ceremony for tiny documentation edits or obvious mechanical changes. Do use the full harness when a task touches solver behavior, numerical conventions, reference comparison, parser compatibility, result schema, or phase execution. ## Sprint Contract Every implementation sprint must have a contract before code changes begin. Recommended location: - `phases/{phase}/stepN.md` for phase execution. - `phases/{phase}/contracts/stepN-contract.md` only when a separate negotiation artifact is useful. Required sections: ````markdown # Sprint Contract: {name} ## Objective {one concise outcome} ## Required Reading - /AGENTS.md - /PROGRESS.md - /PLAN.md - /docs/README.md - /docs/HARNESS_ENGINEERING.md - {topic docs} ## Scope - {what may be changed} ## Allowed Files - {paths or modules} ## Explicit Non-Goals - {what must not be done} ## Tests To Write First - {test files or test cases} ## Reference Artifacts - {references/*.inp or references/*_displacements.csv, or "none"} ## Acceptance Commands ```bash python scripts/validate_workspace.py ``` ## Evaluator Checklist - {contract-specific checks} ## Handoff Requirements - Update PROGRESS.md for completed work. - Update PLAN.md for future work or changed blockers. ```` Contract quality rules: - The contract must be testable. - The contract must identify unsupported Abaqus features rather than expanding support implicitly. - The contract must state whether reference data is used. - The contract must name file ownership boundaries to reduce conflicts. - The contract must not prescribe formulas that are not present in `docs/MITC4_FORMULATION.md` or a cited source. ## Generator Rules The Generator implements one contract at a time. Required behavior: - Read the contract and required docs before editing. - Write or update tests before implementation. - Keep changes inside allowed files unless the contract is updated first. - Preserve architecture boundaries from `docs/ARCHITECTURE.md` and `docs/ADR.md`. - Preserve numerical conventions from `docs/NUMERICAL_CONVENTIONS.md`. - Run acceptance commands. - Update `PROGRESS.md` and `PLAN.md` only for factual state changes. Generator failure modes to avoid: - Broad refactors outside the contract. - Implementing parser support because a stored reference `.inp` contains unsupported Abaqus features. - Comparing only reduced vectors when full-vector reaction recovery is required. - Treating a passing compile as sufficient without tests or reference checks. ## Evaluator Rules The Evaluator is independent from the Generator. Evaluation order: 1. Read the sprint contract. 2. Read `AGENTS.md`, `PROGRESS.md`, `PLAN.md`, and the topic docs. 3. Inspect the changed files. 4. Run or review the acceptance commands. 5. Check tests, reference artifacts, and documented conventions. 6. Return pass/fail findings with concrete file references. The Evaluator must fail the sprint if any of these are true: - Required tests were not written first or are missing. - `python scripts/validate_workspace.py` fails without explanation. - A CRITICAL rule in `AGENTS.md` is violated. - A change drifts from `docs/ARCHITECTURE.md`, `docs/ADR.md`, or `docs/NUMERICAL_CONVENTIONS.md`. - `references/*_displacements.csv` comparison is required but not implemented or not checked. - `RF` is computed from reduced quantities when full-vector recovery is required. - Unsupported Abaqus features are silently accepted. - Completed work is not recorded in `PROGRESS.md`, or future tasks are not recorded in `PLAN.md`. If the sprint fails, the Evaluator should produce a concise feedback artifact: ```markdown # Evaluation Feedback: {contract} ## Verdict fail ## Findings - {severity}: {file} - {risk} ## Required Fixes - {minimal fix} ## Verification To Rerun - {commands} ``` ## FESA Evaluation Rubric Use this rubric for implementation review. | Criterion | Pass Condition | |---|---| | Contract compliance | Changes stay within scope and allowed files | | Architecture | Domain, AnalysisModel, AnalysisState, DofManager, adapters, and factories follow documented ownership | | Numerical conventions | DOF order, units, signs, double precision, int64 ids, constrained/free mapping, and full-vector reactions are preserved | | Reference verification | Stored `references/` artifacts are used when required; CSV column mapping is correct | | Tests | Tests exist before implementation and cover failure modes, not only happy paths | | Diagnostics | Unsupported input and singular systems produce actionable diagnostics | | Results schema | Outputs follow step/frame/field/history and HDF5 schema rules | | Handoff | `PLAN.md` and `PROGRESS.md` reflect the new state | ## Harness Complexity Policy Add harness complexity only when it catches real risk. Use a single agent for: - small wording changes. - mechanical docs updates. - metadata-only corrections. Use Planner -> Generator -> Evaluator for: - C++ solver implementation. - parser behavior changes. - result schema or HDF5 writer changes. - reference comparator changes. - MITC4 formulation-dependent work. - phase generation or execution. Review the harness periodically. If an agent role no longer adds value, simplify it.