Files
FESADev/docs/HARNESS_ENGINEERING.md
T
2026-05-01 02:29:30 +09:00

5.9 KiB

Harness Engineering

Purpose

This document defines how FESA uses long-running agent harnesses for planning, implementation, and evaluation.

The goal is not to maximize agent count. The goal is to keep long solver work coherent, testable, and reference-verified across context resets and independent sessions.

Default Harness Shape

Use the smallest harness that can safely handle the task.

For meaningful solver implementation or phase execution, use:

Planner -> Generator -> Evaluator

Roles:

  • Planner: turns project docs and PLAN.md tasks into a testable sprint contract or phase step.
  • Generator: implements exactly one accepted contract using TDD.
  • Evaluator: independently checks the result against the contract, docs, tests, reference artifacts, and validation commands.

Do not use multi-agent ceremony for tiny documentation edits or obvious mechanical changes. Do use the full harness when a task touches solver behavior, numerical conventions, reference comparison, parser compatibility, result schema, or phase execution.

Sprint Contract

Every implementation sprint must have a contract before code changes begin.

Recommended location:

  • phases/{phase}/stepN.md for phase execution.
  • phases/{phase}/contracts/stepN-contract.md only when a separate negotiation artifact is useful.

Required sections:

# Sprint Contract: {name}

## Objective
{one concise outcome}

## Required Reading
- /AGENTS.md
- /PROGRESS.md
- /PLAN.md
- /docs/README.md
- /docs/HARNESS_ENGINEERING.md
- {topic docs}

## Scope
- {what may be changed}

## Allowed Files
- {paths or modules}

## Explicit Non-Goals
- {what must not be done}

## Tests To Write First
- {test files or test cases}

## Reference Artifacts
- {references/*.inp or references/*_displacements.csv, or "none"}

## Acceptance Commands
```bash
python scripts/validate_workspace.py
```

## Evaluator Checklist
- {contract-specific checks}

## Handoff Requirements
- Update PROGRESS.md for completed work.
- Update PLAN.md for future work or changed blockers.

Contract quality rules:

  • The contract must be testable.
  • The contract must identify unsupported Abaqus features rather than expanding support implicitly.
  • The contract must state whether reference data is used.
  • The contract must name file ownership boundaries to reduce conflicts.
  • The contract must not prescribe formulas that are not present in docs/MITC4_FORMULATION.md or a cited source.

Generator Rules

The Generator implements one contract at a time.

Required behavior:

  • Read the contract and required docs before editing.
  • Write or update tests before implementation.
  • Keep changes inside allowed files unless the contract is updated first.
  • Preserve architecture boundaries from docs/ARCHITECTURE.md and docs/ADR.md.
  • Preserve numerical conventions from docs/NUMERICAL_CONVENTIONS.md.
  • Run acceptance commands.
  • Update PROGRESS.md and PLAN.md only for factual state changes.

Generator failure modes to avoid:

  • Broad refactors outside the contract.
  • Implementing parser support because a stored reference .inp contains unsupported Abaqus features.
  • Comparing only reduced vectors when full-vector reaction recovery is required.
  • Treating a passing compile as sufficient without tests or reference checks.

Evaluator Rules

The Evaluator is independent from the Generator.

Evaluation order:

  1. Read the sprint contract.
  2. Read AGENTS.md, PROGRESS.md, PLAN.md, and the topic docs.
  3. Inspect the changed files.
  4. Run or review the acceptance commands.
  5. Check tests, reference artifacts, and documented conventions.
  6. Return pass/fail findings with concrete file references.

The Evaluator must fail the sprint if any of these are true:

  • Required tests were not written first or are missing.
  • python scripts/validate_workspace.py fails without explanation.
  • A CRITICAL rule in AGENTS.md is violated.
  • A change drifts from docs/ARCHITECTURE.md, docs/ADR.md, or docs/NUMERICAL_CONVENTIONS.md.
  • references/*_displacements.csv comparison is required but not implemented or not checked.
  • RF is computed from reduced quantities when full-vector recovery is required.
  • Unsupported Abaqus features are silently accepted.
  • Completed work is not recorded in PROGRESS.md, or future tasks are not recorded in PLAN.md.

If the sprint fails, the Evaluator should produce a concise feedback artifact:

# Evaluation Feedback: {contract}

## Verdict
fail

## Findings
- {severity}: {file} - {risk}

## Required Fixes
- {minimal fix}

## Verification To Rerun
- {commands}

FESA Evaluation Rubric

Use this rubric for implementation review.

Criterion Pass Condition
Contract compliance Changes stay within scope and allowed files
Architecture Domain, AnalysisModel, AnalysisState, DofManager, adapters, and factories follow documented ownership
Numerical conventions DOF order, units, signs, double precision, int64 ids, constrained/free mapping, and full-vector reactions are preserved
Reference verification Stored references/ artifacts are used when required; CSV column mapping is correct
Tests Tests exist before implementation and cover failure modes, not only happy paths
Diagnostics Unsupported input and singular systems produce actionable diagnostics
Results schema Outputs follow step/frame/field/history and HDF5 schema rules
Handoff PLAN.md and PROGRESS.md reflect the new state

Harness Complexity Policy

Add harness complexity only when it catches real risk.

Use a single agent for:

  • small wording changes.
  • mechanical docs updates.
  • metadata-only corrections.

Use Planner -> Generator -> Evaluator for:

  • C++ solver implementation.
  • parser behavior changes.
  • result schema or HDF5 writer changes.
  • reference comparator changes.
  • MITC4 formulation-dependent work.
  • phase generation or execution.

Review the harness periodically. If an agent role no longer adds value, simplify it.