Files

T

2026-06-10 17:12:23 +09:00

9.9 KiB

Raw Blame History

CLI reference (`scripts/image_gen.py`)

This file is for the fallback CLI mode only. Read it when the user explicitly asks to use scripts/image_gen.py / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the gpt-image-1.5 true-transparency fallback path.

generate-batch is a CLI subcommand in this fallback path. It is not a top-level mode of the skill. The word batch in a user request is not CLI opt-in by itself.

What this CLI does

generate: generate a new image from a prompt
edit: edit one or more existing images
generate-batch: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls

Real API calls require network access + OPENAI_API_KEY. --dry-run does not.

Quick start (works from any repo)

Set a stable path to the skill CLI (default CODEX_HOME is ~/.codex):

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"

Install dependencies into that environment with its package manager. In uv-managed environments, uv pip install ... remains the preferred path.

Quick start

Dry-run (no API call; no network required; does not require the openai package):

python "$IMAGE_GEN" generate \
  --prompt "Test" \
  --out output/imagegen/test.png \
  --dry-run

Notes:

One-off dry-runs print the API payload and the computed output path(s).
Repo-local finals should live under output/imagegen/.

Generate (requires OPENAI_API_KEY + network):

python "$IMAGE_GEN" generate \
  --prompt "A cozy alpine cabin at dawn" \
  --size 1024x1024 \
  --out output/imagegen/alpine-cabin.png

Edit:

python "$IMAGE_GEN" edit \
  --image input.png \
  --prompt "Replace only the background with a warm sunset" \
  --out output/imagegen/sunset-edit.png

Guardrails

Use the bundled CLI directly (python "$IMAGE_GEN" ...) after activating the correct environment.
Do not create one-off runners (for example gen_images.py) unless the user explicitly asks for a custom wrapper.
Never modify scripts/image_gen.py. If something is missing, ask the user before doing anything else.
Do not silently downgrade from CLI gpt-image-2 or built-in image_gen to CLI gpt-image-1.5; ask first unless the user already explicitly requested gpt-image-1.5, scripts/image_gen.py, or CLI fallback.

Defaults

Model: gpt-image-2
Supported model family for this CLI: GPT Image models (gpt-image-*)
Size: auto
Quality: medium
Output format: png
Default one-off output path: output/imagegen/output.png
Background: unspecified unless --background is set

gpt-image-2 size and model guidance

gpt-image-2 is the default model for new CLI fallback work.

Use --quality low for fast drafts, thumbnails, and quick iterations.
Use --quality medium, --quality high, or --quality auto for final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs.
Square images are typically fastest. Use --size 1024x1024 for quick square drafts.
If the user asks for 4K-style output, use --size 3840x2160 for landscape or --size 2160x3840 for portrait.
Do not pass --input-fidelity with gpt-image-2; this model always uses high fidelity for image inputs.
Do not use --background transparent with gpt-image-2; the default transparent-image workflow uses built-in image_gen on a flat chroma-key background plus local removal. Use gpt-image-1.5 only after the user explicitly confirms the true-transparent CLI fallback, unless they already requested gpt-image-1.5, scripts/image_gen.py, or CLI fallback.

Popular gpt-image-2 sizes:

1024x1024
1536x1024
1024x1536
2048x2048
2048x1152
3840x2160
2160x3840
auto

gpt-image-2 size constraints:

max edge <= 3840px
both edges multiples of 16px
long edge to short edge ratio <= 3:1
total pixels between 655,360 and 8,294,400
outputs above 2560x1440 total pixels are experimental

Fast draft:

python "$IMAGE_GEN" generate \
  --prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
  --quality low \
  --size 1024x1024 \
  --out output/imagegen/mug-draft.png

Final 2K landscape:

python "$IMAGE_GEN" generate \
  --prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
  --quality high \
  --size 2048x1152 \
  --out output/imagegen/mug-hero.png

4K landscape:

python "$IMAGE_GEN" generate \
  --prompt "A detailed architectural visualization at golden hour" \
  --size 3840x2160 \
  --quality high \
  --out output/imagegen/architecture-4k.png

True transparent fallback request:

Ask for confirmation before using this command unless the user already explicitly requested gpt-image-1.5, scripts/image_gen.py, or CLI fallback.

python "$IMAGE_GEN" generate \
  --model gpt-image-1.5 \
  --prompt "A clean product cutout on a transparent background" \
  --background transparent \
  --output-format png \
  --out output/imagegen/product-cutout.png

When using this path, explain briefly that built-in image_gen plus chroma-key removal is the default transparent-image path, but this request needs true model-native transparency. gpt-image-2 does not support background=transparent, so gpt-image-1.5 is required for this confirmed fallback.

Quality, input fidelity, and masks (CLI fallback only)

These are explicit CLI controls. They are not built-in image_gen tool arguments.

--quality works for generate, edit, and generate-batch: low|medium|high|auto
--input-fidelity is edit-only and validated as low|high; it is not supported for gpt-image-2
--mask is edit-only

Example:

python "$IMAGE_GEN" edit \
  --model gpt-image-1.5 \
  --image input.png \
  --prompt "Change only the background" \
  --quality high \
  --input-fidelity high \
  --out output/imagegen/background-edit.png

Mask notes:

For multi-image edits, pass repeated --image flags. Their order is meaningful, so describe each image by index and role in the prompt.
The CLI accepts a single --mask.
Image and mask must be the same size and format and each under 50MB.
Masks must include an alpha channel.
If multiple input images are provided, the mask applies to the first image.
Masking is prompt-guided; do not promise exact pixel-perfect mask boundaries.
Use a PNG mask when possible; the script treats mask handling as best-effort and does not perform full preflight validation beyond file checks/warnings.
In the edit prompt, repeat invariants (change only the background; keep the subject unchanged) to reduce drift.

Output handling

Use tmp/imagegen/ for temporary JSONL inputs or scratch files.
Use output/imagegen/ for final outputs.
Reruns fail if a target file already exists unless you pass --force.
--out-dir changes one-off naming to image_1.<ext>, image_2.<ext>, and so on.
Downscaled copies use the default suffix -web unless you override it.

Common recipes

Generate with augmentation fields:

python "$IMAGE_GEN" generate \
  --prompt "A minimal hero image of a ceramic coffee mug" \
  --use-case "product-mockup" \
  --style "clean product photography" \
  --composition "wide product shot with usable negative space for page copy" \
  --constraints "no logos, no text" \
  --out output/imagegen/mug-hero.png

Generate + also write a downscaled copy for fast web loading:

python "$IMAGE_GEN" generate \
  --prompt "A cozy alpine cabin at dawn" \
  --size 1024x1024 \
  --downscale-max-dim 1024 \
  --out output/imagegen/alpine-cabin.png

Generate multiple prompts concurrently (async batch):

mkdir -p tmp/imagegen output/imagegen/batch
cat > tmp/imagegen/prompts.jsonl << 'EOF'
{"prompt":"Cavernous hangar interior with a compact shuttle parked near the center","use_case":"stylized-concept","composition":"wide-angle, low-angle","lighting":"volumetric light rays through drifting fog","constraints":"no logos or trademarks; no watermark","size":"1536x1024"}
{"prompt":"Gray wolf in profile in a snowy forest","use_case":"photorealistic-natural","composition":"eye-level","constraints":"no logos or trademarks; no watermark","size":"1024x1024"}
EOF

python "$IMAGE_GEN" generate-batch \
  --input tmp/imagegen/prompts.jsonl \
  --out-dir output/imagegen/batch \
  --concurrency 5

rm -f tmp/imagegen/prompts.jsonl