Files
FESADev/docs/superpowers/plans/2026-06-11-harden-execute-runner.md
2026-06-11 17:18:03 +09:00

32 KiB

Harden Execute Runner Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make scripts/execute.py safe enough to use by enforcing codex/ branch names, clean starting state, explicit staging, per-step file allowlists, and validation before every runner-created commit.

Architecture: Keep the runner as a single Python module and add small helper methods to StepExecutor instead of introducing a new framework. Add focused unittest coverage in scripts/test_execute.py using temporary phase directories and mocked subprocess/git calls, then update stale docs only after behavior is covered.

Tech Stack: Python 3 standard library, unittest, unittest.mock, Git command-line interface, existing python scripts/validate_workspace.py.


Scope

This plan does not use the harness-workflow skill. It only prepares scripts/execute.py so that using it later is less likely to damage unrelated work.

Required behavior:

  • Branches created or used by the runner must use codex/<phase-name>.
  • Runner startup must abort on a dirty worktree before branch checkout or Codex execution.
  • Runner commits must never use broad git staging.
  • Each phase step must define an allowlist of paths it may modify.
  • Runner must refuse to commit disallowed file changes.
  • Runner must run Python self-tests and workspace validation before every commit it creates.
  • Commit failure and validation failure must stop the runner instead of only warning.

File Map

  • Modify scripts/execute.py: branch naming, dirty worktree check, allowlist parsing, explicit staging, validation-before-commit, fatal commit failures.
  • Create scripts/test_execute.py: unit tests for runner safety behavior.
  • Modify AGENTS.md: document codex/ branch prefix and safer runner requirements.
  • Modify docs/ARCHITECTURE.md: update Harness execution layer notes if they mention unsafe staging or old branch behavior.
  • Modify .codex/skills/harness-workflow/SKILL.md: update stale documentation that says the runner creates the legacy feat task branch. Do not invoke the skill while making this edit.

Data Contract

Extend each step object in phases/<phase-dir>/index.json with allowed_paths.

{
  "step": 1,
  "name": "Update docs",
  "status": "pending",
  "summary": "",
  "allowed_paths": [
    "docs/*.md",
    "docs/**/*.md",
    "AGENTS.md"
  ]
}

Allowlist semantics:

  • Paths are repository-relative and normalized to forward slashes.
  • Exact file path matches are allowed.
  • Directory prefix entries ending with / allow every path under that directory.
  • Glob entries using *, ?, or [ are matched with fnmatch.fnmatchcase.
  • Runner-managed housekeeping files are always allowed separately:
    • phases/<phase-dir>/index.json
    • phases/<phase-dir>/step<step-num>-output.json
    • phases/index.json

Task 1: Test Harness for scripts/execute.py

Files:

  • Create: scripts/test_execute.py

  • Read: scripts/execute.py

  • Step 1: Add loader and temporary phase helpers

Create scripts/test_execute.py with this initial content:

import importlib.util
import json
import subprocess
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch


def load_execute():
    module_path = Path(__file__).resolve().parent / "execute.py"
    spec = importlib.util.spec_from_file_location("execute", module_path)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module


def write_phase(root: Path, phase_dir: str = "0-mvp", phase_name: str = "0-mvp", steps=None):
    phase_path = root / "phases" / phase_dir
    phase_path.mkdir(parents=True)
    if steps is None:
        steps = [
            {
                "step": 1,
                "name": "Docs",
                "status": "pending",
                "summary": "",
                "allowed_paths": ["docs/*.md"],
            }
        ]
    (phase_path / "index.json").write_text(
        json.dumps({"project": "FESA", "phase": phase_name, "steps": steps}, indent=2),
        encoding="utf-8",
    )
    (phase_path / "step1.md").write_text("# Step 1\n", encoding="utf-8")
    return phase_path


def make_executor(execute, root: Path, phase_dir: str = "0-mvp"):
    with patch.object(execute, "ROOT", root):
        return execute.StepExecutor(phase_dir)


class ExecuteRunnerSafetyTests(unittest.TestCase):
    pass


if __name__ == "__main__":
    unittest.main()
  • Step 2: Run the new empty test module

Run:

python -m unittest scripts.test_execute

Expected: pass with one empty test class loaded and no failures.

  • Step 3: Commit the test scaffold

Run:

git add scripts/test_execute.py
git commit -m "test: add execute runner safety test scaffold"

Task 2: codex/ Branch Prefix

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing tests for branch naming and push branch

Add these methods to ExecuteRunnerSafetyTests:

    def test_branch_name_uses_codex_prefix_and_sanitized_phase(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, phase_name="linear truss/1d")
            executor = make_executor(execute, root)

            self.assertEqual(executor._branch_name(), "codex/linear-truss-1d")

    def test_finalize_push_uses_codex_branch_name(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, phase_name="0-mvp")
            executor = make_executor(execute, root)
            executor._auto_push = True
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args == ("diff", "--cached", "--quiet"):
                    return subprocess.CompletedProcess(args, 0, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_run_git", side_effect=fake_git):
                with patch.object(executor, "_validate_before_commit", create=True):
                    executor._finalize()

            self.assertIn(("push", "-u", "origin", "codex/0-mvp"), calls)
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because StepExecutor has no _branch_name() and _finalize() still uses feat-.

  • Step 3: Implement branch naming

In scripts/execute.py, import re near the other imports:

import re

Add this method to StepExecutor under the # --- git --- section:

    def _branch_name(self) -> str:
        slug = re.sub(r"[^A-Za-z0-9._-]+", "-", self._phase_name.strip())
        slug = slug.strip("/.-")
        if not slug:
            slug = self._phase_dir_name
        return f"codex/{slug}"

Replace both direct branch constructions:

branch = self._branch_name()

with:

branch = self._branch_name()
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: use codex branch prefix in execute runner"

Task 3: Dirty Worktree Protection

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing tests for dirty worktree guard

Add these methods:

    def test_assert_clean_worktree_exits_when_git_status_has_changes(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)

            with patch.object(
                executor,
                "_run_git",
                return_value=subprocess.CompletedProcess([], 0, " M AGENTS.md\n?? scratch.txt\n", ""),
            ):
                with self.assertRaises(SystemExit) as cm:
                    executor._assert_clean_worktree("before checkout")

            self.assertEqual(cm.exception.code, 1)

    def test_run_checks_clean_worktree_before_checkout(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            calls = []

            def record(name):
                def inner(*args, **kwargs):
                    calls.append(name)
                return inner

            with patch.object(executor, "_assert_clean_worktree", side_effect=record("clean")):
                with patch.object(executor, "_checkout_branch", side_effect=record("checkout")):
                    with patch.object(executor, "_print_header"):
                        with patch.object(executor, "_check_blockers"):
                            with patch.object(executor, "_load_guardrails", return_value=""):
                                with patch.object(executor, "_ensure_created_at"):
                                    with patch.object(executor, "_execute_all_steps"):
                                        with patch.object(executor, "_finalize"):
                                            executor.run()

            self.assertLess(calls.index("clean"), calls.index("checkout"))
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _assert_clean_worktree() does not exist and run() does not call it.

  • Step 3: Implement dirty worktree guard

Add this method under # --- git ---:

    def _assert_clean_worktree(self, context: str):
        r = self._run_git("status", "--porcelain")
        if r.returncode != 0:
            print("  ERROR: git status failed.")
            print(f"  {r.stderr.strip()}")
            sys.exit(1)
        dirty = r.stdout.strip()
        if dirty:
            print(f"  ERROR: dirty worktree detected {context}.")
            print("  Commit, stash, or remove these changes before running scripts/execute.py:")
            for line in dirty.splitlines():
                print(f"    {line}")
            sys.exit(1)

Change run() so the guard runs before branch checkout:

        self._print_header()
        self._check_blockers()
        self._assert_clean_worktree("before branch checkout")
        self._checkout_branch()
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: block execute runner on dirty worktree"

Task 4: Per-Step File Allowlist

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing tests for allowlist matching and missing allowlist

Add these methods:

    def test_step_allowlist_accepts_exact_prefix_and_glob_paths(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            patterns = ["AGENTS.md", "docs/", "scripts/*.py"]

            self.assertTrue(executor._path_allowed("AGENTS.md", patterns))
            self.assertTrue(executor._path_allowed("docs/PRD.md", patterns))
            self.assertTrue(executor._path_allowed("scripts/execute.py", patterns))
            self.assertFalse(executor._path_allowed(".codex/hooks.json", patterns))

    def test_step_without_allowed_paths_is_rejected_before_codex_invocation(self):
        execute = load_execute()
        steps = [{"step": 1, "name": "Unsafe", "status": "pending", "summary": ""}]
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, steps=steps)
            executor = make_executor(execute, root)

            with self.assertRaises(SystemExit) as cm:
                executor._validate_step_allowlist(steps[0])

            self.assertEqual(cm.exception.code, 1)
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because allowlist helpers do not exist.

  • Step 3: Implement allowlist helpers

Add fnmatch import:

import fnmatch

Add these methods under # --- git --- or a new # --- file allowlist --- section:

    @staticmethod
    def _normalize_rel_path(path: str) -> str:
        return path.replace("\\", "/").lstrip("./")

    def _path_allowed(self, path: str, patterns: list[str]) -> bool:
        rel = self._normalize_rel_path(path)
        for raw in patterns:
            pattern = self._normalize_rel_path(str(raw))
            if not pattern:
                continue
            if pattern.endswith("/") and rel.startswith(pattern):
                return True
            if any(ch in pattern for ch in "*?[") and fnmatch.fnmatchcase(rel, pattern):
                return True
            if rel == pattern:
                return True
        return False

    def _validate_step_allowlist(self, step: dict):
        allowed = step.get("allowed_paths")
        if not isinstance(allowed, list) or not allowed or not all(isinstance(p, str) and p.strip() for p in allowed):
            print(f"  ERROR: Step {step.get('step')} must define non-empty allowed_paths.")
            sys.exit(1)

Call _validate_step_allowlist(pending) before _execute_single_step(pending, guardrails) in _execute_all_steps().

  • Step 4: Add allowlist to prompt context

In _build_preamble, add a parameter:

    def _build_preamble(self, guardrails: str, step_context: str,
                        allowed_paths: list[str],
                        prev_error: Optional[str] = None) -> str:

Add this block before ## 작업 규칙:

            f"## Step 파일 allowlist\n\n"
            f"이 step은 아래 repository-relative path만 수정할 수 있습니다.\n"
            f"{chr(10).join(f'- {p}' for p in allowed_paths)}\n\n"

Update the caller in _execute_single_step():

preamble = self._build_preamble(guardrails, step_context, step.get("allowed_paths", []), prev_error)
  • Step 5: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 6: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: require execute step file allowlists"

Task 5: Detect and Block Disallowed Step Changes

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing tests for changed path classification

Add this method:

    def test_classify_step_changes_splits_allowed_housekeeping_and_disallowed_paths(self):
        execute = load_execute()
        step = {
            "step": 1,
            "name": "Docs",
            "status": "completed",
            "summary": "",
            "allowed_paths": ["docs/*.md"],
        }
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            changed = [
                "docs/PRD.md",
                "phases/0-mvp/index.json",
                "phases/0-mvp/step1-output.json",
                "scripts/execute.py",
            ]

            allowed, housekeeping, disallowed = executor._classify_step_changes(1, step, changed)

            self.assertEqual(allowed, ["docs/PRD.md"])
            self.assertEqual(housekeeping, ["phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"])
            self.assertEqual(disallowed, ["scripts/execute.py"])
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _classify_step_changes() does not exist.

  • Step 3: Implement changed path collection and classification

Add these methods:

    def _changed_paths(self) -> list[str]:
        paths: list[str] = []
        tracked = self._run_git("diff", "--name-only")
        if tracked.returncode != 0:
            print("  ERROR: git diff --name-only failed.")
            print(f"  {tracked.stderr.strip()}")
            sys.exit(1)
        paths.extend(tracked.stdout.splitlines())

        staged = self._run_git("diff", "--cached", "--name-only")
        if staged.returncode != 0:
            print("  ERROR: git diff --cached --name-only failed.")
            print(f"  {staged.stderr.strip()}")
            sys.exit(1)
        paths.extend(staged.stdout.splitlines())

        untracked = self._run_git("ls-files", "--others", "--exclude-standard")
        if untracked.returncode != 0:
            print("  ERROR: git ls-files --others failed.")
            print(f"  {untracked.stderr.strip()}")
            sys.exit(1)
        paths.extend(untracked.stdout.splitlines())

        return sorted({self._normalize_rel_path(p) for p in paths if p.strip()})

    def _housekeeping_paths(self, step_num: int) -> set[str]:
        return {
            f"phases/{self._phase_dir_name}/index.json",
            f"phases/{self._phase_dir_name}/step{step_num}-output.json",
            "phases/index.json",
        }

    def _classify_step_changes(self, step_num: int, step: dict, changed_paths: list[str]) -> tuple[list[str], list[str], list[str]]:
        allowed_patterns = step.get("allowed_paths", [])
        housekeeping_set = self._housekeeping_paths(step_num)
        allowed: list[str] = []
        housekeeping: list[str] = []
        disallowed: list[str] = []
        for path in changed_paths:
            rel = self._normalize_rel_path(path)
            if rel in housekeeping_set:
                housekeeping.append(rel)
            elif self._path_allowed(rel, allowed_patterns):
                allowed.append(rel)
            else:
                disallowed.append(rel)
        return allowed, housekeeping, disallowed
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: classify execute step file changes"

Task 6: Remove Broad Staging and Stage Explicit Paths Only

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing test that commit path never uses broad staging

Add this method:

    def test_commit_step_stages_only_explicit_allowed_and_housekeeping_paths(self):
        execute = load_execute()
        step = {
            "step": 1,
            "name": "Docs",
            "status": "completed",
            "summary": "",
            "allowed_paths": ["docs/*.md"],
        }
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args in {
                    ("diff", "--quiet", "--cached", "--"),
                    ("diff", "--cached", "--quiet"),
                }:
                    return subprocess.CompletedProcess(args, 1, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_changed_paths", return_value=["docs/PRD.md", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"]):
                with patch.object(executor, "_run_git", side_effect=fake_git):
                    with patch.object(executor, "_validate_before_commit"):
                        executor._commit_step(step, "Docs")

            self.assertNotIn(("add", "-A"), calls)
            self.assertIn(("add", "--", "docs/PRD.md"), calls)
            self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"), calls)
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _commit_step() still accepts (step_num, step_name) and uses broad staging.

  • Step 3: Implement explicit staging helper

Add this method:

    def _stage_paths(self, paths: list[str]):
        if not paths:
            return
        r = self._run_git("add", "--", *paths)
        if r.returncode != 0:
            print("  ERROR: git add failed.")
            print(f"  {r.stderr.strip()}")
            sys.exit(1)

Change _commit_step signature:

    def _commit_step(self, step: dict, step_name: str):
        step_num = step["step"]

Replace the old broad staging logic with:

        changed = self._changed_paths()
        allowed, housekeeping, disallowed = self._classify_step_changes(step_num, step, changed)
        if disallowed:
            print(f"  ERROR: Step {step_num} modified files outside allowed_paths:")
            for path in disallowed:
                print(f"    {path}")
            sys.exit(1)

        if allowed:
            msg = self.FEAT_MSG.format(phase=self._phase_name, num=step_num, name=step_name)
            self._validate_before_commit(msg)
            self._stage_paths(allowed)
            if self._run_git("diff", "--cached", "--quiet").returncode != 0:
                r = self._run_git("commit", "-m", msg)
                if r.returncode != 0:
                    print(f"  ERROR: 코드 커밋 실패: {r.stderr.strip()}")
                    sys.exit(1)
                print(f"  Commit: {msg}")

        if housekeeping:
            msg = self.CHORE_MSG.format(phase=self._phase_name, num=step_num)
            self._validate_before_commit(msg)
            self._stage_paths(housekeeping)
            if self._run_git("diff", "--cached", "--quiet").returncode != 0:
                r = self._run_git("commit", "-m", msg)
                if r.returncode != 0:
                    print(f"  ERROR: housekeeping 커밋 실패: {r.stderr.strip()}")
                    sys.exit(1)

Update caller:

self._commit_step(step, step_name)
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: fail only if _validate_before_commit() is not implemented yet. If so, continue to Task 7 before expecting full pass.

Task 7: Validation Before Commit

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing tests for validation command behavior

Add these methods:

    def test_validate_before_commit_runs_python_selftest_then_workspace_validation(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            commands = []

            def fake_run(cmd, **kwargs):
                commands.append(cmd)
                return subprocess.CompletedProcess(cmd, 0, "ok", "")

            with patch.object(execute.subprocess, "run", side_effect=fake_run):
                executor._validate_before_commit("feat(0-mvp): step 1")

            self.assertEqual(
                commands,
                [
                    [sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
                    [sys.executable, "scripts/validate_workspace.py"],
                ],
            )

    def test_validate_before_commit_exits_before_commit_when_validation_fails(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)

            def fake_run(cmd, **kwargs):
                return subprocess.CompletedProcess(cmd, 1, "bad", "failed")

            with patch.object(execute.subprocess, "run", side_effect=fake_run):
                with self.assertRaises(SystemExit) as cm:
                    executor._validate_before_commit("feat(0-mvp): step 1")

            self.assertEqual(cm.exception.code, 1)

Add import sys to scripts/test_execute.py.

  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _validate_before_commit() does not exist.

  • Step 3: Implement validation-before-commit

Add this class constant near MAX_RETRIES:

    VALIDATION_COMMANDS = (
        [sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
        [sys.executable, "scripts/validate_workspace.py"],
    )

Add this method:

    def _validate_before_commit(self, commit_message: str):
        print(f"  Validation before commit: {commit_message}")
        for cmd in self.VALIDATION_COMMANDS:
            r = subprocess.run(cmd, cwd=self._root, capture_output=True, text=True)
            if r.returncode != 0:
                print(f"  ERROR: validation failed before commit: {' '.join(cmd)}")
                if r.stdout:
                    print(r.stdout[-2000:])
                if r.stderr:
                    print(r.stderr[-2000:])
                sys.exit(1)
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: validate execute runner before commits"

Task 8: Finalize Without Broad Staging

Files:

  • Modify: scripts/test_execute.py

  • Modify: scripts/execute.py

  • Step 1: Add failing test for final commit explicit staging

Add this method:

    def test_finalize_stages_only_phase_indexes(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            (root / "phases" / "index.json").write_text('{"phases":[]}', encoding="utf-8")
            executor = make_executor(execute, root)
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args == ("diff", "--cached", "--quiet"):
                    return subprocess.CompletedProcess(args, 1, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_run_git", side_effect=fake_git):
                with patch.object(executor, "_validate_before_commit"):
                    executor._finalize()

            self.assertNotIn(("add", "-A"), calls)
            self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/index.json"), calls)
  • Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _finalize() still uses broad staging.

  • Step 3: Replace final staging

In _finalize(), replace:

        self._run_git("add", "-A")

with:

        final_paths = [f"phases/{self._phase_dir_name}/index.json"]
        if self._top_index_file.exists():
            final_paths.append("phases/index.json")
        self._validate_before_commit(f"chore({self._phase_name}): mark phase completed")
        self._stage_paths(final_paths)

Keep the existing cached diff check and commit message, but change commit failure from silent no-op to fatal:

            if r.returncode == 0:
                print(f"  ✓ {msg}")
            else:
                print(f"  ERROR: phase completion commit failed: {r.stderr.strip()}")
                sys.exit(1)
  • Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

  • Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: stage execute finalization explicitly"

Task 9: Documentation and Existing Contract Tests

Files:

  • Modify: AGENTS.md

  • Modify: docs/ARCHITECTURE.md

  • Modify: .codex/skills/harness-workflow/SKILL.md only to remove stale legacy feat branch wording; do not use the skill.

  • Modify: existing scripts/test_* files only if they assert stale branch or staging behavior.

  • Step 1: Update AGENTS.md runner rules

Add these bullets under the Harness runner rules:

- `scripts/execute.py``codex/<phase-name>` branch prefix만 사용한다.
- `scripts/execute.py` 실행 전 worktree는 clean 상태여야 한다.
- phase step은 `allowed_paths`를 선언해야 하며 runner는 allowlist 밖 변경을 커밋하지 않는다.
- runner는 broad staging을 사용하지 않고 explicit path만 stage한다.
- runner가 만드는 모든 commit 전에는 Python self-test와 `python scripts/validate_workspace.py`가 통과해야 한다.
  • Step 2: Update architecture docs

In docs/ARCHITECTURE.md, update the Harness execution layer section so it states:

scripts/execute.py:
- creates/checks out `codex/<phase-name>`
- refuses to run on a dirty worktree
- requires per-step `allowed_paths`
- stages only explicit allowed paths and runner housekeeping files
- runs Python self-test and workspace validation before every runner-created commit
  • Step 3: Update stale harness workflow description without invoking the skill

In .codex/skills/harness-workflow/SKILL.md, replace the sentence that says:

`scripts/execute.py` creates or checks out the legacy feat task branch

with:

`scripts/execute.py` creates or checks out `codex/{task-name}`

Also add one sentence that phase steps must provide allowed_paths.

  • Step 4: Run docs-related searches

Run:

rg -n "legacy-feat-branch|broad staging|allowed_paths|codex/" AGENTS.md docs .codex scripts

Expected:

  • No stale scripts/execute.py branch docs saying the legacy feat task branch.

  • No broad staging in scripts/execute.py.

  • allowed_paths appears in runner docs and tests.

  • Step 5: Commit

Run:

git add AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "docs: document safe execute runner contract"

Task 10: Full Verification

Files:

  • Read: all changed files

  • Step 1: Run Python self-tests

Run:

python -m unittest discover -s scripts -p "test_*.py"

Expected:

OK
  • Step 2: Run workspace validation

Run:

python scripts/validate_workspace.py

Expected in current no-CMake workspace:

No C++ validation commands configured.
Add CMakeLists.txt or set HARNESS_VALIDATION_COMMANDS.

Exit code must be 0.

  • Step 3: Run whitespace check

Run:

git diff --check

Expected: exit code 0. LF-to-CRLF warnings are acceptable if there are no whitespace error lines.

  • Step 4: Run safety searches

Run:

rg -n "legacy-feat-branch|broad staging" scripts/execute.py AGENTS.md docs .codex

Expected: no matches.

Run:

rg -n "allowed_paths|codex/" scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md

Expected: matches showing the new branch prefix and allowlist contract.

  • Step 5: Final commit if verification-only edits were needed

Run only if Task 10 caused additional edits:

git add scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "fix: harden execute runner safety checks"

Acceptance Checklist

  • scripts/execute.py contains no broad staging.
  • _branch_name() returns a codex/ branch name and _checkout_branch()/_finalize() both use it.
  • run() calls _assert_clean_worktree() before _checkout_branch().
  • Every pending step must have a non-empty allowed_paths.
  • Runner-managed phase index/output files are allowed as housekeeping, not mixed with code/doc step commits.
  • Any path outside allowed_paths and housekeeping paths blocks commit and exits non-zero.
  • _validate_before_commit() runs Python self-tests and scripts/validate_workspace.py before every runner-created commit.
  • Validation failure exits before staging/committing.
  • Commit failure exits non-zero.
  • Documentation no longer describes feat- branches for scripts/execute.py.