9.9 KiB
CLI reference (scripts/image_gen.py)
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use scripts/image_gen.py / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the gpt-image-1.5 true-transparency fallback path.
generate-batch is a CLI subcommand in this fallback path. It is not a top-level mode of the skill.
The word batch in a user request is not CLI opt-in by itself.
What this CLI does
generate: generate a new image from a promptedit: edit one or more existing imagesgenerate-batch: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls
Real API calls require network access + OPENAI_API_KEY. --dry-run does not.
Quick start (works from any repo)
Set a stable path to the skill CLI (default CODEX_HOME is ~/.codex):
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"
Install dependencies into that environment with its package manager. In uv-managed environments, uv pip install ... remains the preferred path.
Quick start
Dry-run (no API call; no network required; does not require the openai package):
python "$IMAGE_GEN" generate \
--prompt "Test" \
--out output/imagegen/test.png \
--dry-run
Notes:
- One-off dry-runs print the API payload and the computed output path(s).
- Repo-local finals should live under
output/imagegen/.
Generate (requires OPENAI_API_KEY + network):
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--out output/imagegen/alpine-cabin.png
Edit:
python "$IMAGE_GEN" edit \
--image input.png \
--prompt "Replace only the background with a warm sunset" \
--out output/imagegen/sunset-edit.png
Guardrails
- Use the bundled CLI directly (
python "$IMAGE_GEN" ...) after activating the correct environment. - Do not create one-off runners (for example
gen_images.py) unless the user explicitly asks for a custom wrapper. - Never modify
scripts/image_gen.py. If something is missing, ask the user before doing anything else. - Do not silently downgrade from CLI
gpt-image-2or built-inimage_gento CLIgpt-image-1.5; ask first unless the user already explicitly requestedgpt-image-1.5,scripts/image_gen.py, or CLI fallback.
Defaults
- Model:
gpt-image-2 - Supported model family for this CLI: GPT Image models (
gpt-image-*) - Size:
auto - Quality:
medium - Output format:
png - Default one-off output path:
output/imagegen/output.png - Background: unspecified unless
--backgroundis set
gpt-image-2 size and model guidance
gpt-image-2 is the default model for new CLI fallback work.
- Use
--quality lowfor fast drafts, thumbnails, and quick iterations. - Use
--quality medium,--quality high, or--quality autofor final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs. - Square images are typically fastest. Use
--size 1024x1024for quick square drafts. - If the user asks for 4K-style output, use
--size 3840x2160for landscape or--size 2160x3840for portrait. - Do not pass
--input-fidelitywithgpt-image-2; this model always uses high fidelity for image inputs. - Do not use
--background transparentwithgpt-image-2; the default transparent-image workflow uses built-inimage_genon a flat chroma-key background plus local removal. Usegpt-image-1.5only after the user explicitly confirms the true-transparent CLI fallback, unless they already requestedgpt-image-1.5,scripts/image_gen.py, or CLI fallback.
Popular gpt-image-2 sizes:
1024x10241536x10241024x15362048x20482048x11523840x21602160x3840auto
gpt-image-2 size constraints:
- max edge
<= 3840px - both edges multiples of
16px - long edge to short edge ratio
<= 3:1 - total pixels between
655,360and8,294,400 - outputs above
2560x1440total pixels are experimental
Fast draft:
python "$IMAGE_GEN" generate \
--prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
--quality low \
--size 1024x1024 \
--out output/imagegen/mug-draft.png
Final 2K landscape:
python "$IMAGE_GEN" generate \
--prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
--quality high \
--size 2048x1152 \
--out output/imagegen/mug-hero.png
4K landscape:
python "$IMAGE_GEN" generate \
--prompt "A detailed architectural visualization at golden hour" \
--size 3840x2160 \
--quality high \
--out output/imagegen/architecture-4k.png
True transparent fallback request:
Ask for confirmation before using this command unless the user already explicitly requested gpt-image-1.5, scripts/image_gen.py, or CLI fallback.
python "$IMAGE_GEN" generate \
--model gpt-image-1.5 \
--prompt "A clean product cutout on a transparent background" \
--background transparent \
--output-format png \
--out output/imagegen/product-cutout.png
When using this path, explain briefly that built-in image_gen plus chroma-key removal is the default transparent-image path, but this request needs true model-native transparency. gpt-image-2 does not support background=transparent, so gpt-image-1.5 is required for this confirmed fallback.
Quality, input fidelity, and masks (CLI fallback only)
These are explicit CLI controls. They are not built-in image_gen tool arguments.
--qualityworks forgenerate,edit, andgenerate-batch:low|medium|high|auto--input-fidelityis edit-only and validated aslow|high; it is not supported forgpt-image-2--maskis edit-only
Example:
python "$IMAGE_GEN" edit \
--model gpt-image-1.5 \
--image input.png \
--prompt "Change only the background" \
--quality high \
--input-fidelity high \
--out output/imagegen/background-edit.png
Mask notes:
- For multi-image edits, pass repeated
--imageflags. Their order is meaningful, so describe each image by index and role in the prompt. - The CLI accepts a single
--mask. - Image and mask must be the same size and format and each under 50MB.
- Masks must include an alpha channel.
- If multiple input images are provided, the mask applies to the first image.
- Masking is prompt-guided; do not promise exact pixel-perfect mask boundaries.
- Use a PNG mask when possible; the script treats mask handling as best-effort and does not perform full preflight validation beyond file checks/warnings.
- In the edit prompt, repeat invariants (
change only the background; keep the subject unchanged) to reduce drift.
Output handling
- Use
tmp/imagegen/for temporary JSONL inputs or scratch files. - Use
output/imagegen/for final outputs. - Reruns fail if a target file already exists unless you pass
--force. --out-dirchanges one-off naming toimage_1.<ext>,image_2.<ext>, and so on.- Downscaled copies use the default suffix
-webunless you override it.
Common recipes
Generate with augmentation fields:
python "$IMAGE_GEN" generate \
--prompt "A minimal hero image of a ceramic coffee mug" \
--use-case "product-mockup" \
--style "clean product photography" \
--composition "wide product shot with usable negative space for page copy" \
--constraints "no logos, no text" \
--out output/imagegen/mug-hero.png
Generate + also write a downscaled copy for fast web loading:
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--downscale-max-dim 1024 \
--out output/imagegen/alpine-cabin.png
Generate multiple prompts concurrently (async batch):
mkdir -p tmp/imagegen output/imagegen/batch
cat > tmp/imagegen/prompts.jsonl << 'EOF'
{"prompt":"Cavernous hangar interior with a compact shuttle parked near the center","use_case":"stylized-concept","composition":"wide-angle, low-angle","lighting":"volumetric light rays through drifting fog","constraints":"no logos or trademarks; no watermark","size":"1536x1024"}
{"prompt":"Gray wolf in profile in a snowy forest","use_case":"photorealistic-natural","composition":"eye-level","constraints":"no logos or trademarks; no watermark","size":"1024x1024"}
EOF
python "$IMAGE_GEN" generate-batch \
--input tmp/imagegen/prompts.jsonl \
--out-dir output/imagegen/batch \
--concurrency 5
rm -f tmp/imagegen/prompts.jsonl
Notes:
generate-batchrequires--out-dir.- generate-batch requires --out-dir.
- Use
--concurrencyto control parallelism (default5). - Per-job overrides are supported in JSONL (for example
size,quality,background,output_format,output_compression,moderation,n,model,out, and prompt-augmentation fields). --ngenerates multiple variants for a single prompt;generate-batchis for many different prompts.- In batch mode, per-job
outis treated as a filename under--out-dir. - For many requested deliverable assets, provide one prompt/job per distinct asset and use semantic filenames when possible.
CLI notes
- Supported sizes depend on the model.
gpt-image-2supports flexible constrained sizes; older GPT Image models support1024x1024,1536x1024,1024x1536, orauto. - True transparent CLI outputs require
output_formatto bepngorwebpand are not supported bygpt-image-2. --prompt-file,--output-compression,--moderation,--max-attempts,--fail-fast,--force, and--no-augmentare supported.- This CLI is intended for GPT Image models. Do not assume older non-GPT image-model behavior applies here.
See also
- API parameter quick reference for fallback CLI mode:
references/image-api.md - Prompt examples shared across both top-level modes:
references/sample-prompts.md - Network/sandbox notes for fallback CLI mode:
references/codex-network.md - Built-in-first transparent image workflow:
SKILL.mdand$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py