Files

T

김경종 742f311be1 modify docu

2026-06-11 17:18:03 +09:00

32 KiB

Raw Blame History

Harden Execute Runner Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make scripts/execute.py safe enough to use by enforcing codex/ branch names, clean starting state, explicit staging, per-step file allowlists, and validation before every runner-created commit.

Architecture: Keep the runner as a single Python module and add small helper methods to StepExecutor instead of introducing a new framework. Add focused unittest coverage in scripts/test_execute.py using temporary phase directories and mocked subprocess/git calls, then update stale docs only after behavior is covered.

Tech Stack: Python 3 standard library, unittest, unittest.mock, Git command-line interface, existing python scripts/validate_workspace.py.

Scope

This plan does not use the harness-workflow skill. It only prepares scripts/execute.py so that using it later is less likely to damage unrelated work.

Required behavior:

Branches created or used by the runner must use codex/<phase-name>.
Runner startup must abort on a dirty worktree before branch checkout or Codex execution.
Runner commits must never use broad git staging.
Each phase step must define an allowlist of paths it may modify.
Runner must refuse to commit disallowed file changes.
Runner must run Python self-tests and workspace validation before every commit it creates.
Commit failure and validation failure must stop the runner instead of only warning.

File Map

Modify scripts/execute.py: branch naming, dirty worktree check, allowlist parsing, explicit staging, validation-before-commit, fatal commit failures.
Create scripts/test_execute.py: unit tests for runner safety behavior.
Modify AGENTS.md: document codex/ branch prefix and safer runner requirements.
Modify docs/ARCHITECTURE.md: update Harness execution layer notes if they mention unsafe staging or old branch behavior.
Modify .codex/skills/harness-workflow/SKILL.md: update stale documentation that says the runner creates the legacy feat task branch. Do not invoke the skill while making this edit.

Data Contract

Extend each step object in phases/<phase-dir>/index.json with allowed_paths.

{
  "step": 1,
  "name": "Update docs",
  "status": "pending",
  "summary": "",
  "allowed_paths": [
    "docs/*.md",
    "docs/**/*.md",
    "AGENTS.md"
  ]
}

Allowlist semantics:

Paths are repository-relative and normalized to forward slashes.
Exact file path matches are allowed.
Directory prefix entries ending with / allow every path under that directory.
Glob entries using *, ?, or [ are matched with fnmatch.fnmatchcase.
Runner-managed housekeeping files are always allowed separately:
- phases/<phase-dir>/index.json
- phases/<phase-dir>/step<step-num>-output.json
- phases/index.json

Task 1: Test Harness for `scripts/execute.py`

Files:

Create: scripts/test_execute.py
Read: scripts/execute.py
Step 1: Add loader and temporary phase helpers

Create scripts/test_execute.py with this initial content:

import importlib.util
import json
import subprocess
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch


def load_execute():
    module_path = Path(__file__).resolve().parent / "execute.py"
    spec = importlib.util.spec_from_file_location("execute", module_path)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module


def write_phase(root: Path, phase_dir: str = "0-mvp", phase_name: str = "0-mvp", steps=None):
    phase_path = root / "phases" / phase_dir
    phase_path.mkdir(parents=True)
    if steps is None:
        steps = [
            {
                "step": 1,
                "name": "Docs",
                "status": "pending",
                "summary": "",
                "allowed_paths": ["docs/*.md"],
            }
        ]
    (phase_path / "index.json").write_text(
        json.dumps({"project": "FESA", "phase": phase_name, "steps": steps}, indent=2),
        encoding="utf-8",
    )
    (phase_path / "step1.md").write_text("# Step 1\n", encoding="utf-8")
    return phase_path


def make_executor(execute, root: Path, phase_dir: str = "0-mvp"):
    with patch.object(execute, "ROOT", root):
        return execute.StepExecutor(phase_dir)


class ExecuteRunnerSafetyTests(unittest.TestCase):
    pass


if __name__ == "__main__":
    unittest.main()

Step 2: Run the new empty test module

Run:

python -m unittest scripts.test_execute

Expected: pass with one empty test class loaded and no failures.

Step 3: Commit the test scaffold

Run:

git add scripts/test_execute.py
git commit -m "test: add execute runner safety test scaffold"

Task 2: `codex/` Branch Prefix

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing tests for branch naming and push branch

Add these methods to ExecuteRunnerSafetyTests:

    def test_branch_name_uses_codex_prefix_and_sanitized_phase(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, phase_name="linear truss/1d")
            executor = make_executor(execute, root)

            self.assertEqual(executor._branch_name(), "codex/linear-truss-1d")

    def test_finalize_push_uses_codex_branch_name(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, phase_name="0-mvp")
            executor = make_executor(execute, root)
            executor._auto_push = True
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args == ("diff", "--cached", "--quiet"):
                    return subprocess.CompletedProcess(args, 0, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_run_git", side_effect=fake_git):
                with patch.object(executor, "_validate_before_commit", create=True):
                    executor._finalize()

            self.assertIn(("push", "-u", "origin", "codex/0-mvp"), calls)

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because StepExecutor has no _branch_name() and _finalize() still uses feat-.

Step 3: Implement branch naming

In scripts/execute.py, import re near the other imports:

import re

Add this method to StepExecutor under the # --- git --- section:

    def _branch_name(self) -> str:
        slug = re.sub(r"[^A-Za-z0-9._-]+", "-", self._phase_name.strip())
        slug = slug.strip("/.-")
        if not slug:
            slug = self._phase_dir_name
        return f"codex/{slug}"

Replace both direct branch constructions:

branch = self._branch_name()

with:

branch = self._branch_name()

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: use codex branch prefix in execute runner"

Task 3: Dirty Worktree Protection

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing tests for dirty worktree guard

Add these methods:

    def test_assert_clean_worktree_exits_when_git_status_has_changes(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)

            with patch.object(
                executor,
                "_run_git",
                return_value=subprocess.CompletedProcess([], 0, " M AGENTS.md\n?? scratch.txt\n", ""),
            ):
                with self.assertRaises(SystemExit) as cm:
                    executor._assert_clean_worktree("before checkout")

            self.assertEqual(cm.exception.code, 1)

    def test_run_checks_clean_worktree_before_checkout(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            calls = []

            def record(name):
                def inner(*args, **kwargs):
                    calls.append(name)
                return inner

            with patch.object(executor, "_assert_clean_worktree", side_effect=record("clean")):
                with patch.object(executor, "_checkout_branch", side_effect=record("checkout")):
                    with patch.object(executor, "_print_header"):
                        with patch.object(executor, "_check_blockers"):
                            with patch.object(executor, "_load_guardrails", return_value=""):
                                with patch.object(executor, "_ensure_created_at"):
                                    with patch.object(executor, "_execute_all_steps"):
                                        with patch.object(executor, "_finalize"):
                                            executor.run()

            self.assertLess(calls.index("clean"), calls.index("checkout"))

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _assert_clean_worktree() does not exist and run() does not call it.

Step 3: Implement dirty worktree guard

Add this method under # --- git ---:

    def _assert_clean_worktree(self, context: str):
        r = self._run_git("status", "--porcelain")
        if r.returncode != 0:
            print("  ERROR: git status failed.")
            print(f"  {r.stderr.strip()}")
            sys.exit(1)
        dirty = r.stdout.strip()
        if dirty:
            print(f"  ERROR: dirty worktree detected {context}.")
            print("  Commit, stash, or remove these changes before running scripts/execute.py:")
            for line in dirty.splitlines():
                print(f"    {line}")
            sys.exit(1)

Change run() so the guard runs before branch checkout:

        self._print_header()
        self._check_blockers()
        self._assert_clean_worktree("before branch checkout")
        self._checkout_branch()

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: block execute runner on dirty worktree"

Task 4: Per-Step File Allowlist

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing tests for allowlist matching and missing allowlist

Add these methods:

    def test_step_allowlist_accepts_exact_prefix_and_glob_paths(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            patterns = ["AGENTS.md", "docs/", "scripts/*.py"]

            self.assertTrue(executor._path_allowed("AGENTS.md", patterns))
            self.assertTrue(executor._path_allowed("docs/PRD.md", patterns))
            self.assertTrue(executor._path_allowed("scripts/execute.py", patterns))
            self.assertFalse(executor._path_allowed(".codex/hooks.json", patterns))

    def test_step_without_allowed_paths_is_rejected_before_codex_invocation(self):
        execute = load_execute()
        steps = [{"step": 1, "name": "Unsafe", "status": "pending", "summary": ""}]
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root, steps=steps)
            executor = make_executor(execute, root)

            with self.assertRaises(SystemExit) as cm:
                executor._validate_step_allowlist(steps[0])

            self.assertEqual(cm.exception.code, 1)

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because allowlist helpers do not exist.

Step 3: Implement allowlist helpers

Add fnmatch import:

import fnmatch

Add these methods under # --- git --- or a new # --- file allowlist --- section:

    @staticmethod
    def _normalize_rel_path(path: str) -> str:
        return path.replace("\\", "/").lstrip("./")

    def _path_allowed(self, path: str, patterns: list[str]) -> bool:
        rel = self._normalize_rel_path(path)
        for raw in patterns:
            pattern = self._normalize_rel_path(str(raw))
            if not pattern:
                continue
            if pattern.endswith("/") and rel.startswith(pattern):
                return True
            if any(ch in pattern for ch in "*?[") and fnmatch.fnmatchcase(rel, pattern):
                return True
            if rel == pattern:
                return True
        return False

    def _validate_step_allowlist(self, step: dict):
        allowed = step.get("allowed_paths")
        if not isinstance(allowed, list) or not allowed or not all(isinstance(p, str) and p.strip() for p in allowed):
            print(f"  ERROR: Step {step.get('step')} must define non-empty allowed_paths.")
            sys.exit(1)

Call _validate_step_allowlist(pending) before _execute_single_step(pending, guardrails) in _execute_all_steps().

Step 4: Add allowlist to prompt context

In _build_preamble, add a parameter:

    def _build_preamble(self, guardrails: str, step_context: str,
                        allowed_paths: list[str],
                        prev_error: Optional[str] = None) -> str:

Add this block before ## 작업 규칙:

            f"## Step 파일 allowlist\n\n"
            f"이 step은 아래 repository-relative path만 수정할 수 있습니다.\n"
            f"{chr(10).join(f'- {p}' for p in allowed_paths)}\n\n"

Update the caller in _execute_single_step():

preamble = self._build_preamble(guardrails, step_context, step.get("allowed_paths", []), prev_error)

Step 5: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 6: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: require execute step file allowlists"

Task 5: Detect and Block Disallowed Step Changes

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing tests for changed path classification

Add this method:

    def test_classify_step_changes_splits_allowed_housekeeping_and_disallowed_paths(self):
        execute = load_execute()
        step = {
            "step": 1,
            "name": "Docs",
            "status": "completed",
            "summary": "",
            "allowed_paths": ["docs/*.md"],
        }
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            changed = [
                "docs/PRD.md",
                "phases/0-mvp/index.json",
                "phases/0-mvp/step1-output.json",
                "scripts/execute.py",
            ]

            allowed, housekeeping, disallowed = executor._classify_step_changes(1, step, changed)

            self.assertEqual(allowed, ["docs/PRD.md"])
            self.assertEqual(housekeeping, ["phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"])
            self.assertEqual(disallowed, ["scripts/execute.py"])

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _classify_step_changes() does not exist.

Step 3: Implement changed path collection and classification

Add these methods:

    def _changed_paths(self) -> list[str]:
        paths: list[str] = []
        tracked = self._run_git("diff", "--name-only")
        if tracked.returncode != 0:
            print("  ERROR: git diff --name-only failed.")
            print(f"  {tracked.stderr.strip()}")
            sys.exit(1)
        paths.extend(tracked.stdout.splitlines())

        staged = self._run_git("diff", "--cached", "--name-only")
        if staged.returncode != 0:
            print("  ERROR: git diff --cached --name-only failed.")
            print(f"  {staged.stderr.strip()}")
            sys.exit(1)
        paths.extend(staged.stdout.splitlines())

        untracked = self._run_git("ls-files", "--others", "--exclude-standard")
        if untracked.returncode != 0:
            print("  ERROR: git ls-files --others failed.")
            print(f"  {untracked.stderr.strip()}")
            sys.exit(1)
        paths.extend(untracked.stdout.splitlines())

        return sorted({self._normalize_rel_path(p) for p in paths if p.strip()})

    def _housekeeping_paths(self, step_num: int) -> set[str]:
        return {
            f"phases/{self._phase_dir_name}/index.json",
            f"phases/{self._phase_dir_name}/step{step_num}-output.json",
            "phases/index.json",
        }

    def _classify_step_changes(self, step_num: int, step: dict, changed_paths: list[str]) -> tuple[list[str], list[str], list[str]]:
        allowed_patterns = step.get("allowed_paths", [])
        housekeeping_set = self._housekeeping_paths(step_num)
        allowed: list[str] = []
        housekeeping: list[str] = []
        disallowed: list[str] = []
        for path in changed_paths:
            rel = self._normalize_rel_path(path)
            if rel in housekeeping_set:
                housekeeping.append(rel)
            elif self._path_allowed(rel, allowed_patterns):
                allowed.append(rel)
            else:
                disallowed.append(rel)
        return allowed, housekeeping, disallowed

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: classify execute step file changes"

Task 6: Remove Broad Staging and Stage Explicit Paths Only

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing test that commit path never uses broad staging

Add this method:

    def test_commit_step_stages_only_explicit_allowed_and_housekeeping_paths(self):
        execute = load_execute()
        step = {
            "step": 1,
            "name": "Docs",
            "status": "completed",
            "summary": "",
            "allowed_paths": ["docs/*.md"],
        }
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args in {
                    ("diff", "--quiet", "--cached", "--"),
                    ("diff", "--cached", "--quiet"),
                }:
                    return subprocess.CompletedProcess(args, 1, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_changed_paths", return_value=["docs/PRD.md", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"]):
                with patch.object(executor, "_run_git", side_effect=fake_git):
                    with patch.object(executor, "_validate_before_commit"):
                        executor._commit_step(step, "Docs")

            self.assertNotIn(("add", "-A"), calls)
            self.assertIn(("add", "--", "docs/PRD.md"), calls)
            self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/0-mvp/step1-output.json"), calls)

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _commit_step() still accepts (step_num, step_name) and uses broad staging.

Step 3: Implement explicit staging helper

Add this method:

    def _stage_paths(self, paths: list[str]):
        if not paths:
            return
        r = self._run_git("add", "--", *paths)
        if r.returncode != 0:
            print("  ERROR: git add failed.")
            print(f"  {r.stderr.strip()}")
            sys.exit(1)

Change _commit_step signature:

    def _commit_step(self, step: dict, step_name: str):
        step_num = step["step"]

Replace the old broad staging logic with:

        changed = self._changed_paths()
        allowed, housekeeping, disallowed = self._classify_step_changes(step_num, step, changed)
        if disallowed:
            print(f"  ERROR: Step {step_num} modified files outside allowed_paths:")
            for path in disallowed:
                print(f"    {path}")
            sys.exit(1)

        if allowed:
            msg = self.FEAT_MSG.format(phase=self._phase_name, num=step_num, name=step_name)
            self._validate_before_commit(msg)
            self._stage_paths(allowed)
            if self._run_git("diff", "--cached", "--quiet").returncode != 0:
                r = self._run_git("commit", "-m", msg)
                if r.returncode != 0:
                    print(f"  ERROR: 코드 커밋 실패: {r.stderr.strip()}")
                    sys.exit(1)
                print(f"  Commit: {msg}")

        if housekeeping:
            msg = self.CHORE_MSG.format(phase=self._phase_name, num=step_num)
            self._validate_before_commit(msg)
            self._stage_paths(housekeeping)
            if self._run_git("diff", "--cached", "--quiet").returncode != 0:
                r = self._run_git("commit", "-m", msg)
                if r.returncode != 0:
                    print(f"  ERROR: housekeeping 커밋 실패: {r.stderr.strip()}")
                    sys.exit(1)

Update caller:

self._commit_step(step, step_name)

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: fail only if _validate_before_commit() is not implemented yet. If so, continue to Task 7 before expecting full pass.

Task 7: Validation Before Commit

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing tests for validation command behavior

Add these methods:

    def test_validate_before_commit_runs_python_selftest_then_workspace_validation(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)
            commands = []

            def fake_run(cmd, **kwargs):
                commands.append(cmd)
                return subprocess.CompletedProcess(cmd, 0, "ok", "")

            with patch.object(execute.subprocess, "run", side_effect=fake_run):
                executor._validate_before_commit("feat(0-mvp): step 1")

            self.assertEqual(
                commands,
                [
                    [sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
                    [sys.executable, "scripts/validate_workspace.py"],
                ],
            )

    def test_validate_before_commit_exits_before_commit_when_validation_fails(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            executor = make_executor(execute, root)

            def fake_run(cmd, **kwargs):
                return subprocess.CompletedProcess(cmd, 1, "bad", "failed")

            with patch.object(execute.subprocess, "run", side_effect=fake_run):
                with self.assertRaises(SystemExit) as cm:
                    executor._validate_before_commit("feat(0-mvp): step 1")

            self.assertEqual(cm.exception.code, 1)

Add import sys to scripts/test_execute.py.

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _validate_before_commit() does not exist.

Step 3: Implement validation-before-commit

Add this class constant near MAX_RETRIES:

    VALIDATION_COMMANDS = (
        [sys.executable, "-m", "unittest", "discover", "-s", "scripts", "-p", "test_*.py"],
        [sys.executable, "scripts/validate_workspace.py"],
    )

Add this method:

    def _validate_before_commit(self, commit_message: str):
        print(f"  Validation before commit: {commit_message}")
        for cmd in self.VALIDATION_COMMANDS:
            r = subprocess.run(cmd, cwd=self._root, capture_output=True, text=True)
            if r.returncode != 0:
                print(f"  ERROR: validation failed before commit: {' '.join(cmd)}")
                if r.stdout:
                    print(r.stdout[-2000:])
                if r.stderr:
                    print(r.stderr[-2000:])
                sys.exit(1)

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: validate execute runner before commits"

Task 8: Finalize Without Broad Staging

Files:

Modify: scripts/test_execute.py
Modify: scripts/execute.py
Step 1: Add failing test for final commit explicit staging

Add this method:

    def test_finalize_stages_only_phase_indexes(self):
        execute = load_execute()
        with tempfile.TemporaryDirectory() as tmp:
            root = Path(tmp)
            write_phase(root)
            (root / "phases" / "index.json").write_text('{"phases":[]}', encoding="utf-8")
            executor = make_executor(execute, root)
            calls = []

            def fake_git(*args):
                calls.append(args)
                if args == ("diff", "--cached", "--quiet"):
                    return subprocess.CompletedProcess(args, 1, "", "")
                return subprocess.CompletedProcess(args, 0, "", "")

            with patch.object(executor, "_run_git", side_effect=fake_git):
                with patch.object(executor, "_validate_before_commit"):
                    executor._finalize()

            self.assertNotIn(("add", "-A"), calls)
            self.assertIn(("add", "--", "phases/0-mvp/index.json", "phases/index.json"), calls)

Step 2: Run tests and verify failure

Run:

python -m unittest scripts.test_execute -v

Expected: fail because _finalize() still uses broad staging.

Step 3: Replace final staging

In _finalize(), replace:

        self._run_git("add", "-A")

with:

        final_paths = [f"phases/{self._phase_dir_name}/index.json"]
        if self._top_index_file.exists():
            final_paths.append("phases/index.json")
        self._validate_before_commit(f"chore({self._phase_name}): mark phase completed")
        self._stage_paths(final_paths)

Keep the existing cached diff check and commit message, but change commit failure from silent no-op to fatal:

            if r.returncode == 0:
                print(f"  ✓ {msg}")
            else:
                print(f"  ERROR: phase completion commit failed: {r.stderr.strip()}")
                sys.exit(1)

Step 4: Run targeted tests

Run:

python -m unittest scripts.test_execute -v

Expected: pass.

Step 5: Commit

Run:

git add scripts/test_execute.py scripts/execute.py
git commit -m "fix: stage execute finalization explicitly"

Task 9: Documentation and Existing Contract Tests

Files:

Modify: AGENTS.md
Modify: docs/ARCHITECTURE.md
Modify: .codex/skills/harness-workflow/SKILL.md only to remove stale legacy feat branch wording; do not use the skill.
Modify: existing scripts/test_* files only if they assert stale branch or staging behavior.
Step 1: Update AGENTS.md runner rules

Add these bullets under the Harness runner rules:

- `scripts/execute.py`는 `codex/<phase-name>` branch prefix만 사용한다.
- `scripts/execute.py` 실행 전 worktree는 clean 상태여야 한다.
- phase step은 `allowed_paths`를 선언해야 하며 runner는 allowlist 밖 변경을 커밋하지 않는다.
- runner는 broad staging을 사용하지 않고 explicit path만 stage한다.
- runner가 만드는 모든 commit 전에는 Python self-test와 `python scripts/validate_workspace.py`가 통과해야 한다.

Step 2: Update architecture docs

In docs/ARCHITECTURE.md, update the Harness execution layer section so it states:

scripts/execute.py:
- creates/checks out `codex/<phase-name>`
- refuses to run on a dirty worktree
- requires per-step `allowed_paths`
- stages only explicit allowed paths and runner housekeeping files
- runs Python self-test and workspace validation before every runner-created commit

Step 3: Update stale harness workflow description without invoking the skill

In .codex/skills/harness-workflow/SKILL.md, replace the sentence that says:

`scripts/execute.py` creates or checks out the legacy feat task branch

with:

`scripts/execute.py` creates or checks out `codex/{task-name}`

Also add one sentence that phase steps must provide allowed_paths.

Step 4: Run docs-related searches

Run:

rg -n "legacy-feat-branch|broad staging|allowed_paths|codex/" AGENTS.md docs .codex scripts

Expected:

No stale scripts/execute.py branch docs saying the legacy feat task branch.
No broad staging in scripts/execute.py.
allowed_paths appears in runner docs and tests.
Step 5: Commit

Run:

git add AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "docs: document safe execute runner contract"

Task 10: Full Verification

Files:

Read: all changed files
Step 1: Run Python self-tests

Run:

python -m unittest discover -s scripts -p "test_*.py"

Expected:

OK

Step 2: Run workspace validation

Run:

python scripts/validate_workspace.py

Expected in current no-CMake workspace:

No C++ validation commands configured.
Add CMakeLists.txt or set HARNESS_VALIDATION_COMMANDS.

Exit code must be 0.

Step 3: Run whitespace check

Run:

git diff --check

Expected: exit code 0. LF-to-CRLF warnings are acceptable if there are no whitespace error lines.

Step 4: Run safety searches

Run:

rg -n "legacy-feat-branch|broad staging" scripts/execute.py AGENTS.md docs .codex

Expected: no matches.

Run:

rg -n "allowed_paths|codex/" scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md

Expected: matches showing the new branch prefix and allowlist contract.

Step 5: Final commit if verification-only edits were needed

Run only if Task 10 caused additional edits:

git add scripts/execute.py scripts/test_execute.py AGENTS.md docs/ARCHITECTURE.md .codex/skills/harness-workflow/SKILL.md
git commit -m "fix: harden execute runner safety checks"

Acceptance Checklist

scripts/execute.py contains no broad staging.
_branch_name() returns a codex/ branch name and _checkout_branch()/_finalize() both use it.
run() calls _assert_clean_worktree() before _checkout_branch().
Every pending step must have a non-empty allowed_paths.
Runner-managed phase index/output files are allowed as housekeeping, not mixed with code/doc step commits.
Any path outside allowed_paths and housekeeping paths blocks commit and exits non-zero.
_validate_before_commit() runs Python self-tests and scripts/validate_workspace.py before every runner-created commit.
Validation failure exits before staging/committing.
Commit failure exits non-zero.
Documentation no longer describes feat- branches for scripts/execute.py.

32 KiB Raw Blame History

Harden Execute Runner Implementation Plan

Scope

File Map

Data Contract

Task 1: Test Harness for scripts/execute.py

Task 2: codex/ Branch Prefix

Task 3: Dirty Worktree Protection

Task 4: Per-Step File Allowlist

Task 5: Detect and Block Disallowed Step Changes

Task 6: Remove Broad Staging and Stage Explicit Paths Only

Task 7: Validation Before Commit

Task 8: Finalize Without Broad Staging

Task 9: Documentation and Existing Contract Tests

Task 10: Full Verification

Acceptance Checklist

32 KiB

Raw Blame History

Task 1: Test Harness for `scripts/execute.py`

Task 2: `codex/` Branch Prefix