Files
FESADev/docs/superpowers/plans/2026-06-11-harden-execute-runner.md
T
2026-06-11 17:18:03 +09:00

1089 lines
32 KiB
Markdown

# Harden Execute Runner Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make `scripts/execute.py` safe enough to use by enforcing `codex/` branch names, clean starting state, explicit staging, per-step file allowlists, and validation before every runner-created commit.
**Architecture:** Keep the runner as a single Python module and add small helper methods to `StepExecutor` instead of introducing a new framework. Add focused `unittest` coverage in `scripts/test_execute.py` using temporary phase directories and mocked subprocess/git calls, then update stale docs only after behavior is covered.
**Tech Stack:** Python 3 standard library, `unittest`, `unittest.mock`, Git command-line interface, existing `python scripts/validate_workspace.py`.
---
## Scope
This plan does not use the `harness-workflow` skill. It only prepares `scripts/execute.py` so that using it later is less likely to damage unrelated work.
Required behavior:
- Branches created or used by the runner must use `codex/<phase-name>`.
- Runner startup must abort on a dirty worktree before branch checkout or Codex execution.
- Runner commits must never use broad git staging.
- Each phase step must define an allowlist of paths it may modify.
- Runner must refuse to commit disallowed file changes.
- Runner must run Python self-tests and workspace validation before every commit it creates.
- Commit failure and validation failure must stop the runner instead of only warning.
## File Map
- Modify `scripts/execute.py`: branch naming, dirty worktree check, allowlist parsing, explicit staging, validation-before-commit, fatal commit failures.
- Create `scripts/test_execute.py`: unit tests for runner safety behavior.
- Modify `AGENTS.md`: document `codex/` branch prefix and safer runner requirements.
- Modify `docs/ARCHITECTURE.md`: update Harness execution layer notes if they mention unsafe staging or old branch behavior.
- Modify `.codex/skills/harness-workflow/SKILL.md`: update stale documentation that says the runner creates the legacy feat task branch. Do not invoke the skill while making this edit.
## Data Contract
Extend each step object in `phases/<phase-dir>/index.json` with `allowed_paths`.
```json
{
"step": 1,
"name": "Update docs",
"status": "pending",
"summary": "",
"allowed_paths": [
"docs/*.md",
"docs/**/*.md",
"AGENTS.md"
]
}
```
Allowlist semantics:
- Paths are repository-relative and normalized to forward slashes.
- Exact file path matches are allowed.
- Directory prefix entries ending with `/` allow every path under that directory.
- Glob entries using `*`, `?`, or `[` are matched with `fnmatch.fnmatchcase`.
- Runner-managed housekeeping files are always allowed separately:
- `phases/<phase-dir>/index.json`
- `phases/<phase-dir>/step<step-num>-output.json`
- `phases/index.json`
## Task 1: Test Harness for `scripts/execute.py`
**Files:**
- Create: `scripts/test_execute.py`
- Read: `scripts/execute.py`
- [ ] **Step 1: Add loader and temporary phase helpers**
Create `scripts/test_execute.py` with this initial content:
```python
import importlib.util
import json
import subprocess
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch
def load_execute():
module_path = Path(__file__).resolve().parent / "execute.py"
spec = importlib.util.spec_from_file_location("execute", module_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def write_phase(root: Path, phase_dir: str = "0-mvp", phase_name: str = "0-mvp", steps=None):
phase_path = root / "phases" / phase_dir
phase_path.mkdir(parents=True)
if steps is None:
steps = [
{
"step": 1,
"name": "Docs",
"status": "pending",
"summary": "",
"allowed_paths": ["docs/*.md"],
}
]
(phase_path / "index.json").write_text(
json.dumps({"project": "FESA", "phase": phase_name, "steps": steps}, indent=2),
encoding="utf-8",
)
(phase_path / "step1.md").write_text("# Step 1\n", encoding="utf-8")
return phase_path
def make_executor(execute, root: Path, phase_dir: str = "0-mvp"):
with patch.object(execute, "ROOT", root):
return execute.StepExecutor(phase_dir)
class ExecuteRunnerSafetyTests(unittest.TestCase):
pass
if __name__ == "__main__":
unittest.main()
```
- [ ] **Step 2: Run the new empty test module**
Run:
```powershell
python -m unittest scripts.test_execute
```
Expected: pass with one empty test class loaded and no failures.
- [ ] **Step 3: Commit the test scaffold**
Run:
```powershell
git add scripts/test_execute.py
git commit -m "test: add execute runner safety test scaffold"
```
## Task 2: `codex/` Branch Prefix
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing tests for branch naming and push branch**
Add these methods to `ExecuteRunnerSafetyTests`:
```python
def test_branch_name_uses_codex_prefix_and_sanitized_phase(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root, phase_name="linear truss/1d")
executor = make_executor(execute, root)
self.assertEqual(executor._branch_name(), "codex/linear-truss-1d")
def test_finalize_push_uses_codex_branch_name(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root, phase_name="0-mvp")
executor = make_executor(execute, root)
executor._auto_push = True
calls = []
def fake_git(*args):
calls.append(args)
if args == ("diff", "--cached", "--quiet"):
return subprocess.CompletedProcess(args, 0, "", "")
return subprocess.CompletedProcess(args, 0, "", "")
with patch.object(executor, "_run_git", side_effect=fake_git):
with patch.object(executor, "_validate_before_commit", create=True):
executor._finalize()
self.assertIn(("push", "-u", "origin", "codex/0-mvp"), calls)
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `StepExecutor` has no `_branch_name()` and `_finalize()` still uses `feat-`.
- [ ] **Step 3: Implement branch naming**
In `scripts/execute.py`, import `re` near the other imports:
```python
import re
```
Add this method to `StepExecutor` under the `# --- git ---` section:
```python
def _branch_name(self) -> str:
slug = re.sub(r"[^A-Za-z0-9._-]+", "-", self._phase_name.strip())
slug = slug.strip("/.-")
if not slug:
slug = self._phase_dir_name
return f"codex/{slug}"
```
Replace both direct branch constructions:
```python
branch = self._branch_name()
```
with:
```python
branch = self._branch_name()
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 5: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: use codex branch prefix in execute runner"
```
## Task 3: Dirty Worktree Protection
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing tests for dirty worktree guard**
Add these methods:
```python
def test_assert_clean_worktree_exits_when_git_status_has_changes(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
with patch.object(
executor,
"_run_git",
return_value=subprocess.CompletedProcess([], 0, " M AGENTS.md\n?? scratch.txt\n", ""),
):
with self.assertRaises(SystemExit) as cm:
executor._assert_clean_worktree("before checkout")
self.assertEqual(cm.exception.code, 1)
def test_run_checks_clean_worktree_before_checkout(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
calls = []
def record(name):
def inner(*args, **kwargs):
calls.append(name)
return inner
with patch.object(executor, "_assert_clean_worktree", side_effect=record("clean")):
with patch.object(executor, "_checkout_branch", side_effect=record("checkout")):
with patch.object(executor, "_print_header"):
with patch.object(executor, "_check_blockers"):
with patch.object(executor, "_load_guardrails", return_value=""):
with patch.object(executor, "_ensure_created_at"):
with patch.object(executor, "_execute_all_steps"):
with patch.object(executor, "_finalize"):
executor.run()
self.assertLess(calls.index("clean"), calls.index("checkout"))
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `_assert_clean_worktree()` does not exist and `run()` does not call it.
- [ ] **Step 3: Implement dirty worktree guard**
Add this method under `# --- git ---`:
```python
def _assert_clean_worktree(self, context: str):
r = self._run_git("status", "--porcelain")
if r.returncode != 0:
print(" ERROR: git status failed.")
print(f" {r.stderr.strip()}")
sys.exit(1)
dirty = r.stdout.strip()
if dirty:
print(f" ERROR: dirty worktree detected {context}.")
print(" Commit, stash, or remove these changes before running scripts/execute.py:")
for line in dirty.splitlines():
print(f" {line}")
sys.exit(1)
```
Change `run()` so the guard runs before branch checkout:
```python
self._print_header()
self._check_blockers()
self._assert_clean_worktree("before branch checkout")
self._checkout_branch()
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 5: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: block execute runner on dirty worktree"
```
## Task 4: Per-Step File Allowlist
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing tests for allowlist matching and missing allowlist**
Add these methods:
```python
def test_step_allowlist_accepts_exact_prefix_and_glob_paths(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
patterns = ["AGENTS.md", "docs/", "scripts/*.py"]
self.assertTrue(executor._path_allowed("AGENTS.md", patterns))
self.assertTrue(executor._path_allowed("docs/PRD.md", patterns))
self.assertTrue(executor._path_allowed("scripts/execute.py", patterns))
self.assertFalse(executor._path_allowed(".codex/hooks.json", patterns))
def test_step_without_allowed_paths_is_rejected_before_codex_invocation(self):
execute = load_execute()
steps = [{"step": 1, "name": "Unsafe", "status": "pending", "summary": ""}]
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root, steps=steps)
executor = make_executor(execute, root)
with self.assertRaises(SystemExit) as cm:
executor._validate_step_allowlist(steps[0])
self.assertEqual(cm.exception.code, 1)
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because allowlist helpers do not exist.
- [ ] **Step 3: Implement allowlist helpers**
Add `fnmatch` import:
```python
import fnmatch
```
Add these methods under `# --- git ---` or a new `# --- file allowlist ---` section:
```python
@staticmethod
def _normalize_rel_path(path: str) -> str:
return path.replace("\\", "/").lstrip("./")
def _path_allowed(self, path: str, patterns: list[str]) -> bool:
rel = self._normalize_rel_path(path)
for raw in patterns:
pattern = self._normalize_rel_path(str(raw))
if not pattern:
continue
if pattern.endswith("/") and rel.startswith(pattern):
return True
if any(ch in pattern for ch in "*?[") and fnmatch.fnmatchcase(rel, pattern):
return True
if rel == pattern:
return True
return False
def _validate_step_allowlist(self, step: dict):
allowed = step.get("allowed_paths")
if not isinstance(allowed, list) or not allowed or not all(isinstance(p, str) and p.strip() for p in allowed):
print(f" ERROR: Step {step.get('step')} must define non-empty allowed_paths.")
sys.exit(1)
```
Call `_validate_step_allowlist(pending)` before `_execute_single_step(pending, guardrails)` in `_execute_all_steps()`.
- [ ] **Step 4: Add allowlist to prompt context**
In `_build_preamble`, add a parameter:
```python
def _build_preamble(self, guardrails: str, step_context: str,
allowed_paths: list[str],
prev_error: Optional[str] = None) -> str:
```
Add this block before `## 작업 규칙`:
```python
f"## Step 파일 allowlist\n\n"
f"이 step은 아래 repository-relative path만 수정할 수 있습니다.\n"
f"{chr(10).join(f'- {p}' for p in allowed_paths)}\n\n"
```
Update the caller in `_execute_single_step()`:
```python
preamble = self._build_preamble(guardrails, step_context, step.get("allowed_paths", []), prev_error)
```
- [ ] **Step 5: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 6: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: require execute step file allowlists"
```
## Task 5: Detect and Block Disallowed Step Changes
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing tests for changed path classification**
Add this method:
```python
def test_classify_step_changes_splits_allowed_housekeeping_and_disallowed_paths(self):
execute = load_execute()
step = {
"step": 1,
"name": "Docs",
"status": "completed",
"summary": "",
"allowed_paths": ["docs/*.md"],
}
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
changed = [
"docs/PRD.md",
"phases/0-mvp/index.json",
"phases/0-mvp/step1-output.json",
"scripts/execute.py",
]
allowed, housekeeping, disallowed = executor._classify_step_changes(1, step, changed)
self.assertEqual(allowed, ["docs/PRD.md"])
self.assertEqual(housekeeping, ["phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"])
self.assertEqual(disallowed, ["scripts/execute.py"])
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `_classify_step_changes()` does not exist.
- [ ] **Step 3: Implement changed path collection and classification**
Add these methods:
```python
def _changed_paths(self) -> list[str]:
paths: list[str] = []
tracked = self._run_git("diff", "--name-only")
if tracked.returncode != 0:
print(" ERROR: git diff --name-only failed.")
print(f" {tracked.stderr.strip()}")
sys.exit(1)
paths.extend(tracked.stdout.splitlines())
staged = self._run_git("diff", "--cached", "--name-only")
if staged.returncode != 0:
print(" ERROR: git diff --cached --name-only failed.")
print(f" {staged.stderr.strip()}")
sys.exit(1)
paths.extend(staged.stdout.splitlines())
untracked = self._run_git("ls-files", "--others", "--exclude-standard")
if untracked.returncode != 0:
print(" ERROR: git ls-files --others failed.")
print(f" {untracked.stderr.strip()}")
sys.exit(1)
paths.extend(untracked.stdout.splitlines())
return sorted({self._normalize_rel_path(p) for p in paths if p.strip()})
def _housekeeping_paths(self, step_num: int) -> set[str]:
return {
f"phases/{self._phase_dir_name}/index.json",
f"phases/{self._phase_dir_name}/step{step_num}-output.json",
"phases/index.json",
}
def _classify_step_changes(self, step_num: int, step: dict, changed_paths: list[str]) -> tuple[list[str], list[str], list[str]]:
allowed_patterns = step.get("allowed_paths", [])
housekeeping_set = self._housekeeping_paths(step_num)
allowed: list[str] = []
housekeeping: list[str] = []
disallowed: list[str] = []
for path in changed_paths:
rel = self._normalize_rel_path(path)
if rel in housekeeping_set:
housekeeping.append(rel)
elif self._path_allowed(rel, allowed_patterns):
allowed.append(rel)
else:
disallowed.append(rel)
return allowed, housekeeping, disallowed
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 5: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: classify execute step file changes"
```
## Task 6: Remove Broad Staging and Stage Explicit Paths Only
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing test that commit path never uses broad staging**
Add this method:
```python
def test_commit_step_stages_only_explicit_allowed_and_housekeeping_paths(self):
execute = load_execute()
step = {
"step": 1,
"name": "Docs",
"status": "completed",
"summary": "",
"allowed_paths": ["docs/*.md"],
}
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
calls = []
def fake_git(*args):
calls.append(args)
if args in {
("diff", "--quiet", "--cached", "--"),
("diff", "--cached", "--quiet"),
}:
return subprocess.CompletedProcess(args, 1, "", "")
return subprocess.CompletedProcess(args, 0, "", "")
with patch.object(executor, "_changed_paths", return_value=["docs/PRD.md", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"]):
with patch.object(executor, "_run_git", side_effect=fake_git):
with patch.object(executor, "_validate_before_commit"):
executor._commit_step(step, "Docs")
self.assertNotIn(("add", "-A"), calls)
self.assertIn(("add", "--", "docs/PRD.md"), calls)
self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"), calls)
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `_commit_step()` still accepts `(step_num, step_name)` and uses broad staging.
- [ ] **Step 3: Implement explicit staging helper**
Add this method:
```python
def _stage_paths(self, paths: list[str]):
if not paths:
return
r = self._run_git("add", "--", *paths)
if r.returncode != 0:
print(" ERROR: git add failed.")
print(f" {r.stderr.strip()}")
sys.exit(1)
```
Change `_commit_step` signature:
```python
def _commit_step(self, step: dict, step_name: str):
step_num = step["step"]
```
Replace the old broad staging logic with:
```python
changed = self._changed_paths()
allowed, housekeeping, disallowed = self._classify_step_changes(step_num, step, changed)
if disallowed:
print(f" ERROR: Step {step_num} modified files outside allowed_paths:")
for path in disallowed:
print(f" {path}")
sys.exit(1)
if allowed:
msg = self.FEAT_MSG.format(phase=self._phase_name, num=step_num, name=step_name)
self._validate_before_commit(msg)
self._stage_paths(allowed)
if self._run_git("diff", "--cached", "--quiet").returncode != 0:
r = self._run_git("commit", "-m", msg)
if r.returncode != 0:
print(f" ERROR: 코드 커밋 실패: {r.stderr.strip()}")
sys.exit(1)
print(f" Commit: {msg}")
if housekeeping:
msg = self.CHORE_MSG.format(phase=self._phase_name, num=step_num)
self._validate_before_commit(msg)
self._stage_paths(housekeeping)
if self._run_git("diff", "--cached", "--quiet").returncode != 0:
r = self._run_git("commit", "-m", msg)
if r.returncode != 0:
print(f" ERROR: housekeeping 커밋 실패: {r.stderr.strip()}")
sys.exit(1)
```
Update caller:
```python
self._commit_step(step, step_name)
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail only if `_validate_before_commit()` is not implemented yet. If so, continue to Task 7 before expecting full pass.
## Task 7: Validation Before Commit
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing tests for validation command behavior**
Add these methods:
```python
def test_validate_before_commit_runs_python_selftest_then_workspace_validation(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
commands = []
def fake_run(cmd, **kwargs):
commands.append(cmd)
return subprocess.CompletedProcess(cmd, 0, "ok", "")
with patch.object(execute.subprocess, "run", side_effect=fake_run):
executor._validate_before_commit("feat(0-mvp): step 1")
self.assertEqual(
commands,
[
[sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
[sys.executable, "scripts/validate_workspace.py"],
],
)
def test_validate_before_commit_exits_before_commit_when_validation_fails(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
executor = make_executor(execute, root)
def fake_run(cmd, **kwargs):
return subprocess.CompletedProcess(cmd, 1, "bad", "failed")
with patch.object(execute.subprocess, "run", side_effect=fake_run):
with self.assertRaises(SystemExit) as cm:
executor._validate_before_commit("feat(0-mvp): step 1")
self.assertEqual(cm.exception.code, 1)
```
Add `import sys` to `scripts/test_execute.py`.
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `_validate_before_commit()` does not exist.
- [ ] **Step 3: Implement validation-before-commit**
Add this class constant near `MAX_RETRIES`:
```python
VALIDATION_COMMANDS = (
[sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
[sys.executable, "scripts/validate_workspace.py"],
)
```
Add this method:
```python
def _validate_before_commit(self, commit_message: str):
print(f" Validation before commit: {commit_message}")
for cmd in self.VALIDATION_COMMANDS:
r = subprocess.run(cmd, cwd=self._root, capture_output=True, text=True)
if r.returncode != 0:
print(f" ERROR: validation failed before commit: {' '.join(cmd)}")
if r.stdout:
print(r.stdout[-2000:])
if r.stderr:
print(r.stderr[-2000:])
sys.exit(1)
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 5: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: validate execute runner before commits"
```
## Task 8: Finalize Without Broad Staging
**Files:**
- Modify: `scripts/test_execute.py`
- Modify: `scripts/execute.py`
- [ ] **Step 1: Add failing test for final commit explicit staging**
Add this method:
```python
def test_finalize_stages_only_phase_indexes(self):
execute = load_execute()
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
write_phase(root)
(root / "phases" / "index.json").write_text('{"phases":[]}', encoding="utf-8")
executor = make_executor(execute, root)
calls = []
def fake_git(*args):
calls.append(args)
if args == ("diff", "--cached", "--quiet"):
return subprocess.CompletedProcess(args, 1, "", "")
return subprocess.CompletedProcess(args, 0, "", "")
with patch.object(executor, "_run_git", side_effect=fake_git):
with patch.object(executor, "_validate_before_commit"):
executor._finalize()
self.assertNotIn(("add", "-A"), calls)
self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/index.json"), calls)
```
- [ ] **Step 2: Run tests and verify failure**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: fail because `_finalize()` still uses broad staging.
- [ ] **Step 3: Replace final staging**
In `_finalize()`, replace:
```python
self._run_git("add", "-A")
```
with:
```python
final_paths = [f"phases/{self._phase_dir_name}/index.json"]
if self._top_index_file.exists():
final_paths.append("phases/index.json")
self._validate_before_commit(f"chore({self._phase_name}): mark phase completed")
self._stage_paths(final_paths)
```
Keep the existing cached diff check and commit message, but change commit failure from silent no-op to fatal:
```python
if r.returncode == 0:
print(f"{msg}")
else:
print(f" ERROR: phase completion commit failed: {r.stderr.strip()}")
sys.exit(1)
```
- [ ] **Step 4: Run targeted tests**
Run:
```powershell
python -m unittest scripts.test_execute -v
```
Expected: pass.
- [ ] **Step 5: Commit**
Run:
```powershell
git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: stage execute finalization explicitly"
```
## Task 9: Documentation and Existing Contract Tests
**Files:**
- Modify: `AGENTS.md`
- Modify: `docs/ARCHITECTURE.md`
- Modify: `.codex/skills/harness-workflow/SKILL.md` only to remove stale legacy feat branch wording; do not use the skill.
- Modify: existing `scripts/test_*` files only if they assert stale branch or staging behavior.
- [ ] **Step 1: Update `AGENTS.md` runner rules**
Add these bullets under the Harness runner rules:
```markdown
- `scripts/execute.py``codex/<phase-name>` branch prefix만 사용한다.
- `scripts/execute.py` 실행 전 worktree는 clean 상태여야 한다.
- phase step은 `allowed_paths`를 선언해야 하며 runner는 allowlist 밖 변경을 커밋하지 않는다.
- runner는 broad staging을 사용하지 않고 explicit path만 stage한다.
- runner가 만드는 모든 commit 전에는 Python self-test와 `python scripts/validate_workspace.py`가 통과해야 한다.
```
- [ ] **Step 2: Update architecture docs**
In `docs/ARCHITECTURE.md`, update the Harness execution layer section so it states:
```markdown
scripts/execute.py:
- creates/checks out `codex/<phase-name>`
- refuses to run on a dirty worktree
- requires per-step `allowed_paths`
- stages only explicit allowed paths and runner housekeeping files
- runs Python self-test and workspace validation before every runner-created commit
```
- [ ] **Step 3: Update stale harness workflow description without invoking the skill**
In `.codex/skills/harness-workflow/SKILL.md`, replace the sentence that says:
```markdown
`scripts/execute.py` creates or checks out the legacy feat task branch
```
with:
```markdown
`scripts/execute.py` creates or checks out `codex/{task-name}`
```
Also add one sentence that phase steps must provide `allowed_paths`.
- [ ] **Step 4: Run docs-related searches**
Run:
```powershell
rg -n "legacy-feat-branch|broad staging|allowed_paths|codex/" AGENTS.md docs .codex scripts
```
Expected:
- No stale `scripts/execute.py` branch docs saying the legacy feat task branch.
- No broad staging in `scripts/execute.py`.
- `allowed_paths` appears in runner docs and tests.
- [ ] **Step 5: Commit**
Run:
```powershell
git add AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "docs: document safe execute runner contract"
```
## Task 10: Full Verification
**Files:**
- Read: all changed files
- [ ] **Step 1: Run Python self-tests**
Run:
```powershell
python -m unittest discover -s scripts -p "test_*.py"
```
Expected:
```text
OK
```
- [ ] **Step 2: Run workspace validation**
Run:
```powershell
python scripts/validate_workspace.py
```
Expected in current no-CMake workspace:
```text
No C++ validation commands configured.
Add CMakeLists.txt or set HARNESS_VALIDATION_COMMANDS.
```
Exit code must be `0`.
- [ ] **Step 3: Run whitespace check**
Run:
```powershell
git diff --check
```
Expected: exit code `0`. LF-to-CRLF warnings are acceptable if there are no whitespace error lines.
- [ ] **Step 4: Run safety searches**
Run:
```powershell
rg -n "legacy-feat-branch|broad staging" scripts/execute.py AGENTS.md docs .codex
```
Expected: no matches.
Run:
```powershell
rg -n "allowed_paths|codex/" scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
```
Expected: matches showing the new branch prefix and allowlist contract.
- [ ] **Step 5: Final commit if verification-only edits were needed**
Run only if Task 10 caused additional edits:
```powershell
git add scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "fix: harden execute runner safety checks"
```
## Acceptance Checklist
- `scripts/execute.py` contains no broad staging.
- `_branch_name()` returns a `codex/` branch name and `_checkout_branch()`/`_finalize()` both use it.
- `run()` calls `_assert_clean_worktree()` before `_checkout_branch()`.
- Every pending step must have a non-empty `allowed_paths`.
- Runner-managed phase index/output files are allowed as housekeeping, not mixed with code/doc step commits.
- Any path outside `allowed_paths` and housekeeping paths blocks commit and exits non-zero.
- `_validate_before_commit()` runs Python self-tests and `scripts/validate_workspace.py` before every runner-created commit.
- Validation failure exits before staging/committing.
- Commit failure exits non-zero.
- Documentation no longer describes `feat-` branches for `scripts/execute.py`.