5.9 KiB
Harness Engineering
Purpose
This document defines how FESA uses long-running agent harnesses for planning, implementation, and evaluation.
The goal is not to maximize agent count. The goal is to keep long solver work coherent, testable, and reference-verified across context resets and independent sessions.
Default Harness Shape
Use the smallest harness that can safely handle the task.
For meaningful solver implementation or phase execution, use:
Planner -> Generator -> Evaluator
Roles:
Planner: turns project docs andPLAN.mdtasks into a testable sprint contract or phase step.Generator: implements exactly one accepted contract using TDD.Evaluator: independently checks the result against the contract, docs, tests, reference artifacts, and validation commands.
Do not use multi-agent ceremony for tiny documentation edits or obvious mechanical changes. Do use the full harness when a task touches solver behavior, numerical conventions, reference comparison, parser compatibility, result schema, or phase execution.
Sprint Contract
Every implementation sprint must have a contract before code changes begin.
Recommended location:
phases/{phase}/stepN.mdfor phase execution.phases/{phase}/contracts/stepN-contract.mdonly when a separate negotiation artifact is useful.
Required sections:
# Sprint Contract: {name}
## Objective
{one concise outcome}
## Required Reading
- /AGENTS.md
- /PROGRESS.md
- /PLAN.md
- /docs/README.md
- /docs/HARNESS_ENGINEERING.md
- {topic docs}
## Scope
- {what may be changed}
## Allowed Files
- {paths or modules}
## Explicit Non-Goals
- {what must not be done}
## Tests To Write First
- {test files or test cases}
## Reference Artifacts
- {references/*.inp or references/*_displacements.csv, or "none"}
## Acceptance Commands
```bash
python scripts/validate_workspace.py
```
## Evaluator Checklist
- {contract-specific checks}
## Handoff Requirements
- Update PROGRESS.md for completed work.
- Update PLAN.md for future work or changed blockers.
Contract quality rules:
- The contract must be testable.
- The contract must identify unsupported Abaqus features rather than expanding support implicitly.
- The contract must state whether reference data is used.
- The contract must name file ownership boundaries to reduce conflicts.
- The contract must not prescribe formulas that are not present in
docs/MITC4_FORMULATION.mdor a cited source.
Generator Rules
The Generator implements one contract at a time.
Required behavior:
- Read the contract and required docs before editing.
- Write or update tests before implementation.
- Keep changes inside allowed files unless the contract is updated first.
- Preserve architecture boundaries from
docs/ARCHITECTURE.mdanddocs/ADR.md. - Preserve numerical conventions from
docs/NUMERICAL_CONVENTIONS.md. - Run acceptance commands.
- Update
PROGRESS.mdandPLAN.mdonly for factual state changes.
Generator failure modes to avoid:
- Broad refactors outside the contract.
- Implementing parser support because a stored reference
.inpcontains unsupported Abaqus features. - Comparing only reduced vectors when full-vector reaction recovery is required.
- Treating a passing compile as sufficient without tests or reference checks.
Evaluator Rules
The Evaluator is independent from the Generator.
Evaluation order:
- Read the sprint contract.
- Read
AGENTS.md,PROGRESS.md,PLAN.md, and the topic docs. - Inspect the changed files.
- Run or review the acceptance commands.
- Check tests, reference artifacts, and documented conventions.
- Return pass/fail findings with concrete file references.
The Evaluator must fail the sprint if any of these are true:
- Required tests were not written first or are missing.
python scripts/validate_workspace.pyfails without explanation.- A CRITICAL rule in
AGENTS.mdis violated. - A change drifts from
docs/ARCHITECTURE.md,docs/ADR.md, ordocs/NUMERICAL_CONVENTIONS.md. references/*_displacements.csvcomparison is required but not implemented or not checked.RFis computed from reduced quantities when full-vector recovery is required.- Unsupported Abaqus features are silently accepted.
- Completed work is not recorded in
PROGRESS.md, or future tasks are not recorded inPLAN.md.
If the sprint fails, the Evaluator should produce a concise feedback artifact:
# Evaluation Feedback: {contract}
## Verdict
fail
## Findings
- {severity}: {file} - {risk}
## Required Fixes
- {minimal fix}
## Verification To Rerun
- {commands}
FESA Evaluation Rubric
Use this rubric for implementation review.
| Criterion | Pass Condition |
|---|---|
| Contract compliance | Changes stay within scope and allowed files |
| Architecture | Domain, AnalysisModel, AnalysisState, DofManager, adapters, and factories follow documented ownership |
| Numerical conventions | DOF order, units, signs, double precision, int64 ids, constrained/free mapping, and full-vector reactions are preserved |
| Reference verification | Stored references/ artifacts are used when required; CSV column mapping is correct |
| Tests | Tests exist before implementation and cover failure modes, not only happy paths |
| Diagnostics | Unsupported input and singular systems produce actionable diagnostics |
| Results schema | Outputs follow step/frame/field/history and HDF5 schema rules |
| Handoff | PLAN.md and PROGRESS.md reflect the new state |
Harness Complexity Policy
Add harness complexity only when it catches real risk.
Use a single agent for:
- small wording changes.
- mechanical docs updates.
- metadata-only corrections.
Use Planner -> Generator -> Evaluator for:
- C++ solver implementation.
- parser behavior changes.
- result schema or HDF5 writer changes.
- reference comparator changes.
- MITC4 formulation-dependent work.
- phase generation or execution.
Review the harness periodically. If an agent role no longer adds value, simplify it.