modify template

This commit is contained in:
김경종
2026-06-10 17:12:23 +09:00
parent 2d59191df2
commit df3cc3e890
186 changed files with 24935 additions and 2 deletions
+201
View File
@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf of
any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don\'t include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
+356
View File
@@ -0,0 +1,356 @@
---
name: "imagegen"
description: "Generate or edit raster images when the task benefits from AI-created bitmap visuals such as photos, illustrations, textures, sprites, mockups, or transparent-background cutouts. Use when Codex should create a brand-new image, transform an existing image, or derive visual variants from references, and the output should be a bitmap asset rather than repo-native code or vector. Do not use when the task is better handled by editing existing SVG/vector/code-native assets, extending an established icon or logo system, or building the visual directly in HTML/CSS/canvas."
---
# Image Generation Skill
Generates or edits images for the current project (for example website assets, game assets, UI mockups, product mockups, wireframes, logo design, photorealistic images, or infographics).
## Top-level modes and rules
This skill has exactly two top-level modes:
- **Default built-in tool mode (preferred):** built-in `image_gen` tool for normal image generation, editing, and simple transparent-image requests. Does not require `OPENAI_API_KEY`.
- **Fallback CLI mode:** `scripts/image_gen.py` CLI. Use when the user explicitly asks for the CLI/API/model path, or after the user explicitly confirms a true model-native transparency fallback with `gpt-image-1.5`. Requires `OPENAI_API_KEY`.
Within CLI fallback, the CLI exposes three subcommands:
- `generate`
- `edit`
- `generate-batch`
Rules:
- Use the built-in `image_gen` tool by default for normal image generation and editing requests.
- Do not switch to CLI fallback for ordinary quality, size, or file-path control.
- If the user explicitly asks for a transparent image/background, stay on built-in `image_gen` first: prompt for a flat removable chroma-key background, then remove it locally with the installed helper at `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`.
- Never silently switch from built-in `image_gen` or CLI `gpt-image-2` to CLI `gpt-image-1.5`. Treat this as a model/path downgrade and ask the user before doing it, unless the user has already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
- If a transparent request appears too complex for clean chroma-key removal, asks for true/native transparency, or local removal fails validation, explain that true transparency requires CLI `gpt-image-1.5 --background transparent --output-format png` because `gpt-image-2` does not support `background=transparent`, then ask whether to proceed. Run the CLI fallback only after the user confirms.
- The word `batch` by itself does not mean CLI fallback. If the user asks for many assets or says to batch-generate assets without explicitly asking for CLI/API/model controls, stay on the built-in path and issue one built-in call per requested asset or variant.
- If the built-in tool fails or is unavailable, tell the user the CLI fallback exists and that it requires `OPENAI_API_KEY`. Proceed only if the user explicitly asks for that fallback.
- If the user explicitly asks for CLI mode, use the bundled `scripts/image_gen.py` workflow. Do not create one-off SDK runners.
- Never modify `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
Built-in save-path policy:
- In built-in tool mode, Codex saves generated images under `$CODEX_HOME/*` by default.
- Do not describe or rely on OS temp as the default built-in destination.
- Do not describe or rely on a destination-path argument (if any) on the built-in `image_gen` tool. If a specific location is needed, generate first and then move or copy the selected output from `$CODEX_HOME/generated_images/...`.
- Save-path precedence in built-in mode:
1. If the user names a destination, move or copy the selected output there.
2. If the image is meant for the current project, move or copy the final selected image into the workspace before finishing.
3. If the image is only for preview or brainstorming, render it inline; the underlying file can remain at the default `$CODEX_HOME/*` path.
- Never leave a project-referenced asset only at the default `$CODEX_HOME/*` path.
- Do not overwrite an existing asset unless the user explicitly asked for replacement; otherwise create a sibling versioned filename such as `hero-v2.png` or `item-icon-edited.png`.
Shared prompt guidance for both modes lives in `references/prompting.md` and `references/sample-prompts.md`.
Fallback-only docs/resources for CLI mode:
- `references/cli.md`
- `references/image-api.md`
- `references/codex-network.md`
- `scripts/image_gen.py`
Local post-processing helper:
- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: removes a flat chroma-key background from a generated image and writes a PNG/WebP with alpha. Prefer auto-key sampling, soft matte, and despill for antialiased edges.
## When to use
- Generate a new image (concept art, product shot, cover, website hero)
- Generate a new image using one or more reference images for style, composition, or mood
- Edit an existing image (inpainting, lighting or weather transformations, background replacement, object removal, compositing, transparent background)
- Produce many assets or variants for one task
## When not to use
- Extending or matching an existing SVG/vector icon set, logo system, or illustration library inside the repo
- Creating simple shapes, diagrams, wireframes, or icons that are better produced directly in SVG, HTML/CSS, or canvas
- Making a small project-local asset edit when the source file already exists in an editable native format
- Any task where the user clearly wants deterministic code-native output instead of a generated bitmap
## Decision tree
Think about two separate questions:
1. **Intent:** is this a new image or an edit of an existing image?
2. **Execution strategy:** is this one asset or many assets/variants?
Intent:
- If the user wants to modify an existing image while preserving parts of it, treat the request as **edit**.
- If the user provides images only as references for style, composition, mood, or subject guidance, treat the request as **generate**.
- If the user provides no images, treat the request as **generate**.
Built-in edit semantics:
- Built-in edit mode is for images already visible in the conversation context, such as attached images or images generated earlier in the thread.
- If the user wants to edit a local image file with the built-in tool, first load it with built-in `view_image` tool so the image is visible in the conversation context, then proceed with the built-in edit flow.
- Do not promise arbitrary filesystem-path editing through the built-in tool.
- If a local file still needs direct file-path control, masks, or other explicit CLI-only parameters, use the explicit CLI fallback only when the user asks for it.
- For edits, preserve invariants aggressively and save non-destructively by default.
Execution strategy:
- In the built-in default path, produce many assets or variants by issuing one `image_gen` call per requested asset or variant.
- In the CLI fallback path, use the CLI `generate-batch` subcommand only when the user explicitly chose CLI mode and needs many prompts/assets.
- For many distinct assets, do not use `n` as a substitute for separate prompts. `n` is for variants of one prompt; distinct assets need distinct built-in calls or distinct CLI `generate-batch` jobs.
Assume the user wants a new image unless they clearly ask to change an existing one.
## Workflow
1. Decide the top-level mode: built-in by default, including simple transparent-output requests; fallback CLI only if explicitly requested or after the user explicitly confirms a transparent-output fallback.
2. Decide the intent: `generate` or `edit`.
3. Decide whether the output is preview-only or meant to be consumed by the current project.
4. Decide the execution strategy: single asset vs repeated built-in calls vs CLI `generate-batch`.
5. Collect inputs up front: prompt(s), exact text (verbatim), constraints/avoid list, and any input images.
6. For every input image, label its role explicitly:
- reference image
- edit target
- supporting insert/style/compositing input
7. If the edit target is only on the local filesystem and you are staying on the built-in path, inspect it with `view_image` first so the image is available in conversation context.
8. If the user asked for a photo, illustration, sprite, product image, banner, or other explicitly raster-style asset, use `image_gen` rather than substituting SVG/HTML/CSS placeholders. If the request is for an icon, logo, or UI graphic that should match existing repo-native SVG/vector/code assets, prefer editing those directly instead.
9. Augment the prompt based on specificity:
- If the user's prompt is already specific and detailed, normalize it into a clear spec without adding creative requirements.
- If the user's prompt is generic, add tasteful augmentation only when it materially improves output quality.
10. Use the built-in `image_gen` tool by default.
11. For transparent-output requests, follow the transparent image guidance below: generate with built-in `image_gen` on a flat chroma-key background, copy the selected output into the workspace or `tmp/imagegen/`, run the installed `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py` helper, and validate the alpha result before using it. If this path looks unsuitable or fails, ask before switching to CLI `gpt-image-1.5`.
12. Inspect outputs and validate: subject, style, composition, text accuracy, and invariants/avoid items.
13. Iterate with a single targeted change, then re-check.
14. For preview-only work, render the image inline; the underlying file may remain at the default `$CODEX_HOME/generated_images/...` path.
15. For project-bound work, move or copy the selected artifact into the workspace and update any consuming code or references. Never leave a project-referenced asset only at the default `$CODEX_HOME/generated_images/...` path.
16. For batches or multi-asset requests, persist every requested deliverable final in the workspace unless the user explicitly asked to keep outputs preview-only. Discarded variants do not need to be kept unless requested.
17. If the user explicitly chooses or confirms the CLI fallback, then use the fallback-only docs for model, quality, size, `input_fidelity`, masks, output format, output paths, and network setup.
18. Always report the final saved path(s) for any workspace-bound asset(s), plus the final prompt or prompt set and whether the built-in tool or fallback CLI mode was used.
## Transparent image requests
Transparent-image requests still use built-in `image_gen` first. Because the built-in tool does not expose a true transparent-background control, create a removable chroma-key source image and then convert the key color to alpha locally.
Default sequence:
1. Use built-in `image_gen` to generate the requested subject on a perfectly flat solid chroma-key background.
2. Choose a key color that is unlikely to appear in the subject: default `#00ff00`, use `#ff00ff` for green subjects, and avoid `#0000ff` for blue subjects.
3. After generation, move or copy the selected source image from `$CODEX_HOME/generated_images/...` into the workspace or `tmp/imagegen/`.
4. Run the installed helper path, not a project-relative script path:
```bash
python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" \
--input <source> \
--out <final.png> \
--auto-key border \
--soft-matte \
--transparent-threshold 12 \
--opaque-threshold 220 \
--despill
```
5. Validate that the output has an alpha channel, transparent corners, plausible subject coverage, and no obvious key-color fringe. If a thin fringe remains, retry once with `--edge-contract 1`; use `--edge-feather 0.25` only when the edge is visibly stair-stepped and the subject is not shiny or reflective.
6. Save the final alpha PNG/WebP in the project if the asset is project-bound. Never leave a project-referenced transparent asset only under `$CODEX_HOME/*`.
Prompt transparent requests like this:
```text
Create the requested subject on a perfectly flat solid #00ff00 chroma-key background for background removal.
The background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation.
Keep the subject fully separated from the background with crisp edges and generous padding.
Do not use #00ff00 anywhere in the subject.
No cast shadow, no contact shadow, no reflection, no watermark, and no text unless explicitly requested.
```
Do not automatically use CLI `gpt-image-1.5 --background transparent --output-format png` instead of chroma keying. Ask the user first when the user asks for true/native transparency, when local removal fails validation, or when the requested image is complex: hair, fur, feathers, smoke, glass, liquids, translucent materials, reflective objects, soft shadows, realistic product grounding, or subject colors that conflict with all practical key colors.
Use a concise confirmation like:
```text
This likely needs true native transparency. The default built-in path uses a chroma-key background plus local removal, but true transparency requires the CLI fallback with gpt-image-1.5 because gpt-image-2 does not support background=transparent. It also requires OPENAI_API_KEY. Should I proceed with that CLI fallback?
```
## Prompt augmentation
Reformat user prompts into a structured, production-oriented spec. Make the user's goal clearer and more actionable, but do not blindly add detail.
Treat this as prompt-shaping guidance, not a closed schema. Use only the lines that help, and add a short extra labeled line when it materially improves clarity.
### Specificity policy
Use the user's prompt specificity to decide how much augmentation is appropriate:
- If the prompt is already specific and detailed, preserve that specificity and only normalize/structure it.
- If the prompt is generic, you may add tasteful augmentation when it will materially improve the result.
Allowed augmentations:
- composition or framing hints
- polish level or intended-use hints
- practical layout guidance
- reasonable scene concreteness that supports the stated request
Not allowed augmentations:
- extra characters or objects that are not implied by the request
- brand names, slogans, palettes, or narrative beats that are not implied
- arbitrary side-specific placement unless the surrounding layout supports it
## Use-case taxonomy (exact slugs)
Classify each request into one of these buckets and keep the slug consistent across prompts and references.
Generate:
- photorealistic-natural — candid/editorial lifestyle scenes with real texture and natural lighting.
- product-mockup — product/packaging shots, catalog imagery, merch concepts.
- ui-mockup — app/web interface mockups and wireframes; specify the desired fidelity.
- infographic-diagram — diagrams/infographics with structured layout and text.
- scientific-educational — classroom explainers, scientific diagrams, and learning visuals with required labels and accuracy constraints.
- ads-marketing — campaign concepts and ad creatives with audience, brand position, scene, and exact tagline/copy.
- productivity-visual — slide, chart, workflow, and data-heavy business visuals.
- logo-brand — logo/mark exploration, vector-friendly.
- illustration-story — comics, childrens book art, narrative scenes.
- stylized-concept — style-driven concept art, 3D/stylized renders.
- historical-scene — period-accurate/world-knowledge scenes.
Edit:
- text-localization — translate/replace in-image text, preserve layout.
- identity-preserve — try-on, person-in-scene; lock face/body/pose.
- precise-object-edit — remove/replace a specific element (including interior swaps).
- lighting-weather — time-of-day/season/atmosphere changes only.
- background-extraction — transparent background / clean cutout. Use built-in `image_gen` with chroma-key removal first for simple opaque subjects; ask before using CLI true transparency for complex subjects.
- style-transfer — apply reference style while changing subject/scene.
- compositing — multi-image insert/merge with matched lighting/perspective.
- sketch-to-render — drawing/line art to photoreal render.
## Shared prompt schema
Use the following labeled spec as shared prompt scaffolding for both top-level modes:
```text
Use case: <taxonomy slug>
Asset type: <where the asset will be used>
Primary request: <user's main prompt>
Input images: <Image 1: role; Image 2: role> (optional)
Scene/backdrop: <environment>
Subject: <main subject>
Style/medium: <photo/illustration/3D/etc>
Composition/framing: <wide/close/top-down; placement>
Lighting/mood: <lighting + mood>
Color palette: <palette notes>
Materials/textures: <surface details>
Text (verbatim): "<exact text>"
Constraints: <must keep/must avoid>
Avoid: <negative constraints>
```
Notes:
- `Asset type` and `Input images` are prompt scaffolding, not dedicated CLI flags.
- `Scene/backdrop` refers to the visual setting. It is not the same as the fallback CLI `background` parameter, which controls output transparency behavior.
- Fallback-only execution notes such as `Quality:`, `Input fidelity:`, masks, output format, and output paths belong in the CLI path only. Do not treat them as built-in `image_gen` tool arguments.
Augmentation rules:
- Keep it short.
- Add only the details needed to improve the prompt materially.
- For edits, explicitly list invariants (`change only X; keep Y unchanged`).
- If any critical detail is missing and blocks success, ask a question; otherwise proceed.
## Examples
### Generation example (hero image)
```text
Use case: product-mockup
Asset type: landing page hero
Primary request: a minimal hero image of a ceramic coffee mug
Style/medium: clean product photography
Composition/framing: wide composition with usable negative space for page copy if needed
Lighting/mood: soft studio lighting
Constraints: no logos, no text, no watermark
```
### Edit example (invariants)
```text
Use case: precise-object-edit
Asset type: product photo background replacement
Primary request: replace only the background with a warm sunset gradient
Constraints: change only the background; keep the product and its edges unchanged; no text; no watermark
```
## Prompting best practices
- Structure prompt as scene/backdrop -> subject -> details -> constraints.
- Include intended use (ad, UI mock, infographic) to set the mode and polish level.
- Use camera/composition language for photorealism.
- Only use SVG/vector stand-ins when the user explicitly asked for vector output or a non-image placeholder.
- Quote exact text and specify typography + placement.
- For tricky words, spell them letter-by-letter and require verbatim rendering.
- For multi-image inputs, reference images by index and describe how they should be used.
- For edits, repeat invariants every iteration to reduce drift.
- Iterate with single-change follow-ups.
- If the prompt is generic, add only the extra detail that will materially help.
- If the prompt is already detailed, normalize it instead of expanding it.
- For CLI fallback only, see `references/cli.md` and `references/image-api.md` for model, `quality`, `input_fidelity`, masks, output format, and output-path guidance.
- For transparent images, use the built-in-first chroma-key workflow unless the request is complex enough to need true CLI transparency; ask before switching to CLI `gpt-image-1.5`.
More principles shared by both modes: `references/prompting.md`.
Copy/paste specs shared by both modes: `references/sample-prompts.md`.
## Guidance by asset type
Asset-type templates (website assets, game assets, wireframes, logo) are consolidated in `references/sample-prompts.md`.
## gpt-image-2 guidance for CLI fallback
The fallback CLI defaults to `gpt-image-2`.
- Use `gpt-image-2` for new CLI/API workflows unless the request needs true model-native transparent output.
- If a transparent request may need CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that the built-in chroma-key path is the default, but true transparency requires `gpt-image-1.5` because `gpt-image-2` does not support `background=transparent`.
- `gpt-image-2` always uses high fidelity for image inputs; do not set `input_fidelity` with this model.
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
- Use `quality low` for fast drafts, thumbnails, and quick iterations. Use `medium`, `high`, or `auto` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
- Square images are typically fastest to generate. Use `1024x1024` for fast square drafts.
- If the user asks for 4K-style output, use `3840x2160` for landscape or `2160x3840` for portrait.
- `gpt-image-2` size may be `auto` or `WIDTHxHEIGHT` if all constraints hold: max edge `<= 3840px`, both edges multiples of `16px`, long-to-short ratio `<= 3:1`, total pixels between `655,360` and `8,294,400`.
Popular `gpt-image-2` sizes:
- `1024x1024` square
- `1536x1024` landscape
- `1024x1536` portrait
- `2048x2048` 2K square
- `2048x1152` 2K landscape
- `3840x2160` 4K landscape
- `2160x3840` 4K portrait
- `auto`
## Fallback CLI mode only
### Temp and output conventions
These conventions apply only to the CLI fallback. They do not describe built-in `image_gen` output behavior.
- Use `tmp/imagegen/` for intermediate files (for example JSONL batches); delete them when done.
- Write final artifacts under `output/imagegen/`.
- Use `--out` or `--out-dir` to control output paths; keep filenames stable and descriptive.
### Dependencies
Prefer `uv` for dependency management in this repo.
Required Python package:
```bash
uv pip install openai
```
Required for local chroma-key removal and optional downscaling:
```bash
uv pip install pillow
```
Portability note:
- If you are using the installed skill outside this repo, install dependencies into that environment with its package manager.
- In uv-managed environments, `uv pip install ...` remains the preferred path.
### Environment
- `OPENAI_API_KEY` must be set for live API calls.
- Do not ask the user for `OPENAI_API_KEY` when using the built-in `image_gen` tool.
- Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.
If the key is missing, give the user these steps:
1. Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
2. Set `OPENAI_API_KEY` as an environment variable in their system.
3. Offer to guide them through setting the environment variable for their OS/shell if needed.
If installation is not possible in this environment, tell the user which dependency is missing and how to install it into their active environment.
### Script-mode notes
- CLI commands + examples: `references/cli.md`
- API parameter quick reference: `references/image-api.md`
- Network approvals / sandbox settings for CLI mode: `references/codex-network.md`
## Reference map
- `references/prompting.md`: shared prompting principles for both modes.
- `references/sample-prompts.md`: shared copy/paste prompt recipes for both modes.
- `references/cli.md`: fallback-only CLI usage via `scripts/image_gen.py`.
- `references/image-api.md`: fallback-only API/CLI parameter reference.
- `references/codex-network.md`: fallback-only network/sandbox troubleshooting for CLI mode.
- `scripts/image_gen.py`: fallback-only CLI implementation. Do not load or use it unless the user explicitly chooses CLI mode or explicitly confirms a transparent request's true CLI transparency fallback.
- `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`: local post-processing helper for built-in transparent-image requests.
@@ -0,0 +1,6 @@
interface:
display_name: "Image Gen"
short_description: "Generate or edit images for websites, games, and more"
icon_small: "./assets/imagegen-small.svg"
icon_large: "./assets/imagegen.png"
default_prompt: "Use $imagegen to make or edit an image for this project."
@@ -0,0 +1,5 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" viewBox="0 0 16 16">
<path fill="currentColor" d="M7.51 6.827a1 1 0 1 1 .278 1.982 1 1 0 0 1-.278-1.982Z"/>
<path fill="currentColor" fill-rule="evenodd" d="M8.31 4.47c.368-.016.699.008 1.016.124l.186.075c.423.194.786.5 1.047.888l.067.107c.148.253.235.533.3.848.073.354.126.797.193 1.343l.277 2.25.088.745c.024.224.041.425.049.605.013.322-.004.615-.085.896l-.04.12a2.53 2.53 0 0 1-.802 1.115l-.16.118c-.281.189-.596.292-.956.366a9.46 9.46 0 0 1-.6.1l-.743.094-2.25.277c-.547.067-.99.121-1.35.136a2.765 2.765 0 0 1-.896-.085l-.12-.039a2.533 2.533 0 0 1-1.115-.802l-.118-.161c-.189-.28-.292-.596-.366-.956a9.42 9.42 0 0 1-.1-.599l-.094-.744-.276-2.25a17.884 17.884 0 0 1-.137-1.35c-.015-.367.009-.698.124-1.015l.076-.185c.193-.423.5-.787.887-1.048l.107-.067c.253-.148.534-.234.849-.3.354-.073.796-.126 1.343-.193l2.25-.277.744-.088c.224-.024.425-.041.606-.049Zm-2.905 5.978a1.47 1.47 0 0 0-.875.074c-.127.052-.267.146-.475.344-.212.204-.462.484-.822.889l-.314.351c.018.115.036.219.055.313.061.295.127.458.206.575l.07.094c.167.211.39.372.645.465l.109.032c.119.027.273.038.499.029.308-.013.7-.06 1.264-.13l2.25-.275.727-.093.198-.03-2.05-1.64a16.848 16.848 0 0 0-.96-.738c-.18-.121-.31-.19-.421-.23l-.106-.03Zm2.95-4.915c-.154.006-.33.021-.536.043l-.729.086-2.25.276c-.564.07-.956.118-1.257.18a1.937 1.937 0 0 0-.478.15l-.097.057a1.47 1.47 0 0 0-.515.608l-.044.107c-.048.133-.073.307-.06.608.012.307.06.7.129 1.264l.22 1.8.178-.197c.145-.159.278-.298.403-.418.255-.243.507-.437.809-.56l.181-.067a2.526 2.526 0 0 1 1.328-.06l.118.029c.27.079.517.215.772.387.287.194.619.46 1.03.789l2.52 2.016c.146-.148.26-.326.332-.524l.031-.109c.027-.119.039-.273.03-.499a8.311 8.311 0 0 0-.044-.536l-.086-.728-.276-2.25c-.07-.564-.118-.956-.18-1.258a1.935 1.935 0 0 0-.15-.477l-.057-.098a1.468 1.468 0 0 0-.608-.515l-.107-.043c-.133-.049-.306-.074-.607-.061Z" clip-rule="evenodd"/>
<path fill="currentColor" d="M7.783 1.272c.36.014.803.07 1.35.136l2.25.277.743.095c.224.03.423.062.6.099.36.074.675.177.955.366l.161.118c.364.29.642.675.802 1.115l.04.12c.081.28.098.574.085.896a9.42 9.42 0 0 1-.05.605l-.087.745-.277 2.25c-.067.547-.12.989-.193 1.343a2.765 2.765 0 0 1-.3.848l-.067.107a2.534 2.534 0 0 1-.415.474l-.086.064a.532.532 0 0 1-.622-.858l.13-.13c.04-.046.077-.094.111-.145l.057-.098c.055-.109.104-.256.15-.477.062-.302.11-.694.18-1.258l.276-2.25.086-.728c.022-.207.037-.382.043-.536.01-.226-.002-.38-.029-.5l-.032-.108a1.469 1.469 0 0 0-.464-.646l-.094-.069c-.118-.08-.28-.145-.575-.206a8.285 8.285 0 0 0-.53-.088l-.728-.092-2.25-.276c-.565-.07-.956-.117-1.264-.13a1.94 1.94 0 0 0-.5.029l-.108.032a1.469 1.469 0 0 0-.647.465l-.068.094c-.054.08-.102.18-.146.33l-.04.1a.533.533 0 0 1-.98-.403l.055-.166c.059-.162.133-.314.23-.457l.117-.16c.29-.365.675-.643 1.115-.803l.12-.04c.28-.08.574-.097.896-.084Z"/>
</svg>

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

@@ -0,0 +1,242 @@
# CLI reference (`scripts/image_gen.py`)
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
`generate-batch` is a CLI subcommand in this fallback path. It is not a top-level mode of the skill.
The word `batch` in a user request is not CLI opt-in by itself.
## What this CLI does
- `generate`: generate a new image from a prompt
- `edit`: edit one or more existing images
- `generate-batch`: run many generation jobs from a JSONL file after the user explicitly chooses CLI/API/model controls
Real API calls require **network access** + `OPENAI_API_KEY`. `--dry-run` does not.
## Quick start (works from any repo)
Set a stable path to the skill CLI (default `CODEX_HOME` is `~/.codex`):
```
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export IMAGE_GEN="$CODEX_HOME/skills/.system/imagegen/scripts/image_gen.py"
```
Install dependencies into that environment with its package manager. In uv-managed environments, `uv pip install ...` remains the preferred path.
## Quick start
Dry-run (no API call; no network required; does not require the `openai` package):
```bash
python "$IMAGE_GEN" generate \
--prompt "Test" \
--out output/imagegen/test.png \
--dry-run
```
Notes:
- One-off dry-runs print the API payload and the computed output path(s).
- Repo-local finals should live under `output/imagegen/`.
Generate (requires `OPENAI_API_KEY` + network):
```bash
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--out output/imagegen/alpine-cabin.png
```
Edit:
```bash
python "$IMAGE_GEN" edit \
--image input.png \
--prompt "Replace only the background with a warm sunset" \
--out output/imagegen/sunset-edit.png
```
## Guardrails
- Use the bundled CLI directly (`python "$IMAGE_GEN" ...`) after activating the correct environment.
- Do **not** create one-off runners (for example `gen_images.py`) unless the user explicitly asks for a custom wrapper.
- **Never modify** `scripts/image_gen.py`. If something is missing, ask the user before doing anything else.
- Do not silently downgrade from CLI `gpt-image-2` or built-in `image_gen` to CLI `gpt-image-1.5`; ask first unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
## Defaults
- Model: `gpt-image-2`
- Supported model family for this CLI: GPT Image models (`gpt-image-*`)
- Size: `auto`
- Quality: `medium`
- Output format: `png`
- Default one-off output path: `output/imagegen/output.png`
- Background: unspecified unless `--background` is set
## gpt-image-2 size and model guidance
`gpt-image-2` is the default model for new CLI fallback work.
- Use `--quality low` for fast drafts, thumbnails, and quick iterations.
- Use `--quality medium`, `--quality high`, or `--quality auto` for final assets, dense text, diagrams, identity-sensitive edits, and high-resolution outputs.
- Square images are typically fastest. Use `--size 1024x1024` for quick square drafts.
- If the user asks for 4K-style output, use `--size 3840x2160` for landscape or `--size 2160x3840` for portrait.
- Do not pass `--input-fidelity` with `gpt-image-2`; this model always uses high fidelity for image inputs.
- Do not use `--background transparent` with `gpt-image-2`; the default transparent-image workflow uses built-in `image_gen` on a flat chroma-key background plus local removal. Use `gpt-image-1.5` only after the user explicitly confirms the true-transparent CLI fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
Popular `gpt-image-2` sizes:
- `1024x1024`
- `1536x1024`
- `1024x1536`
- `2048x2048`
- `2048x1152`
- `3840x2160`
- `2160x3840`
- `auto`
`gpt-image-2` size constraints:
- max edge `<= 3840px`
- both edges multiples of `16px`
- long edge to short edge ratio `<= 3:1`
- total pixels between `655,360` and `8,294,400`
- outputs above `2560x1440` total pixels are experimental
Fast draft:
```bash
python "$IMAGE_GEN" generate \
--prompt "A product thumbnail of a matte ceramic mug on a stone surface" \
--quality low \
--size 1024x1024 \
--out output/imagegen/mug-draft.png
```
Final 2K landscape:
```bash
python "$IMAGE_GEN" generate \
--prompt "A polished landing-page hero image of a matte ceramic mug on a stone surface" \
--quality high \
--size 2048x1152 \
--out output/imagegen/mug-hero.png
```
4K landscape:
```bash
python "$IMAGE_GEN" generate \
--prompt "A detailed architectural visualization at golden hour" \
--size 3840x2160 \
--quality high \
--out output/imagegen/architecture-4k.png
```
True transparent fallback request:
Ask for confirmation before using this command unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
```bash
python "$IMAGE_GEN" generate \
--model gpt-image-1.5 \
--prompt "A clean product cutout on a transparent background" \
--background transparent \
--output-format png \
--out output/imagegen/product-cutout.png
```
When using this path, explain briefly that built-in `image_gen` plus chroma-key removal is the default transparent-image path, but this request needs true model-native transparency. `gpt-image-2` does not support `background=transparent`, so `gpt-image-1.5` is required for this confirmed fallback.
## Quality, input fidelity, and masks (CLI fallback only)
These are explicit CLI controls. They are not built-in `image_gen` tool arguments.
- `--quality` works for `generate`, `edit`, and `generate-batch`: `low|medium|high|auto`
- `--input-fidelity` is **edit-only** and validated as `low|high`; it is not supported for `gpt-image-2`
- `--mask` is **edit-only**
Example:
```bash
python "$IMAGE_GEN" edit \
--model gpt-image-1.5 \
--image input.png \
--prompt "Change only the background" \
--quality high \
--input-fidelity high \
--out output/imagegen/background-edit.png
```
Mask notes:
- For multi-image edits, pass repeated `--image` flags. Their order is meaningful, so describe each image by index and role in the prompt.
- The CLI accepts a single `--mask`.
- Image and mask must be the same size and format and each under 50MB.
- Masks must include an alpha channel.
- If multiple input images are provided, the mask applies to the first image.
- Masking is prompt-guided; do not promise exact pixel-perfect mask boundaries.
- Use a PNG mask when possible; the script treats mask handling as best-effort and does not perform full preflight validation beyond file checks/warnings.
- In the edit prompt, repeat invariants (`change only the background; keep the subject unchanged`) to reduce drift.
## Output handling
- Use `tmp/imagegen/` for temporary JSONL inputs or scratch files.
- Use `output/imagegen/` for final outputs.
- Reruns fail if a target file already exists unless you pass `--force`.
- `--out-dir` changes one-off naming to `image_1.<ext>`, `image_2.<ext>`, and so on.
- Downscaled copies use the default suffix `-web` unless you override it.
## Common recipes
Generate with augmentation fields:
```bash
python "$IMAGE_GEN" generate \
--prompt "A minimal hero image of a ceramic coffee mug" \
--use-case "product-mockup" \
--style "clean product photography" \
--composition "wide product shot with usable negative space for page copy" \
--constraints "no logos, no text" \
--out output/imagegen/mug-hero.png
```
Generate + also write a downscaled copy for fast web loading:
```bash
python "$IMAGE_GEN" generate \
--prompt "A cozy alpine cabin at dawn" \
--size 1024x1024 \
--downscale-max-dim 1024 \
--out output/imagegen/alpine-cabin.png
```
Generate multiple prompts concurrently (async batch):
```bash
mkdir -p tmp/imagegen output/imagegen/batch
cat > tmp/imagegen/prompts.jsonl << 'EOF'
{"prompt":"Cavernous hangar interior with a compact shuttle parked near the center","use_case":"stylized-concept","composition":"wide-angle, low-angle","lighting":"volumetric light rays through drifting fog","constraints":"no logos or trademarks; no watermark","size":"1536x1024"}
{"prompt":"Gray wolf in profile in a snowy forest","use_case":"photorealistic-natural","composition":"eye-level","constraints":"no logos or trademarks; no watermark","size":"1024x1024"}
EOF
python "$IMAGE_GEN" generate-batch \
--input tmp/imagegen/prompts.jsonl \
--out-dir output/imagegen/batch \
--concurrency 5
rm -f tmp/imagegen/prompts.jsonl
```
Notes:
- `generate-batch` requires `--out-dir`.
- generate-batch requires --out-dir.
- Use `--concurrency` to control parallelism (default `5`).
- Per-job overrides are supported in JSONL (for example `size`, `quality`, `background`, `output_format`, `output_compression`, `moderation`, `n`, `model`, `out`, and prompt-augmentation fields).
- `--n` generates multiple variants for a single prompt; `generate-batch` is for many different prompts.
- In batch mode, per-job `out` is treated as a filename under `--out-dir`.
- For many requested deliverable assets, provide one prompt/job per distinct asset and use semantic filenames when possible.
## CLI notes
- Supported sizes depend on the model. `gpt-image-2` supports flexible constrained sizes; older GPT Image models support `1024x1024`, `1536x1024`, `1024x1536`, or `auto`.
- True transparent CLI outputs require `output_format` to be `png` or `webp` and are not supported by `gpt-image-2`.
- `--prompt-file`, `--output-compression`, `--moderation`, `--max-attempts`, `--fail-fast`, `--force`, and `--no-augment` are supported.
- This CLI is intended for GPT Image models. Do not assume older non-GPT image-model behavior applies here.
## See also
- API parameter quick reference for fallback CLI mode: `references/image-api.md`
- Prompt examples shared across both top-level modes: `references/sample-prompts.md`
- Network/sandbox notes for fallback CLI mode: `references/codex-network.md`
- Built-in-first transparent image workflow: `SKILL.md` and `$CODEX_HOME/skills/.system/imagegen/scripts/remove_chroma_key.py`
@@ -0,0 +1,33 @@
# Codex network approvals / sandbox notes
This file is for the fallback CLI mode only. Read it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
This guidance is intentionally isolated from `SKILL.md` because it can vary by environment and may become stale. Prefer the defaults in your environment when in doubt.
## Why am I asked to approve image generation calls?
The fallback CLI uses the OpenAI Image API, so it needs outbound network access. In many Codex setups, network access is disabled by default and/or the approval policy requires confirmation before networked commands run.
## Important note about approvals vs network
- `--ask-for-approval never` suppresses approval prompts.
- It does **not** by itself enable network access.
- In `workspace-write`, network access still depends on your Codex configuration (for example `[sandbox_workspace_write] network_access = true`).
## How do I reduce repeated approval prompts?
If you trust the repo and want fewer prompts, use a configuration or profile that both:
- enables network for the sandbox mode you plan to use
- sets an approval policy that matches your risk tolerance
Example `~/.codex/config.toml` pattern:
```toml
approval_policy = "on-request"
sandbox_mode = "workspace-write"
[sandbox_workspace_write]
network_access = true
```
If you want quieter automation after network is enabled, you can choose a stricter approval policy, but do that intentionally and with care.
## Safety note
Enabling network and reducing approvals lowers friction, but increases risk if you run untrusted code or work in an untrusted repository.
@@ -0,0 +1,90 @@
# Image API quick reference
This file is for the fallback CLI mode only. Use it when the user explicitly asks to use `scripts/image_gen.py` / CLI / API / model controls, or after the user explicitly confirms that a transparent-output request should use the `gpt-image-1.5` true-transparency fallback path.
These parameters describe the Image API and bundled CLI fallback surface. Do not assume they are normal arguments on the built-in `image_gen` tool.
## Scope
- This fallback CLI is intended for GPT Image models (`gpt-image-2`, `gpt-image-1.5`, `gpt-image-1`, and `gpt-image-1-mini`).
- The built-in `image_gen` tool and the fallback CLI do not expose the same controls.
## Model summary
| Model | Quality | Input fidelity | Resolutions | Recommended use |
| --- | --- | --- | --- | --- |
| `gpt-image-2` | `low`, `medium`, `high`, `auto` | Always high fidelity for image inputs; do not set `input_fidelity` | `auto` or flexible sizes that satisfy the constraints below | Default for new CLI/API workflows: high-quality generation and editing, text-heavy images, photorealism, compositing, identity-sensitive edits, and workflows where fewer retries matter |
| `gpt-image-1.5` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | True transparent-background fallback and backward-compatible workflows |
| `gpt-image-1` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Legacy compatibility |
| `gpt-image-1-mini` | `low`, `medium`, `high`, `auto` | `low`, `high` | `1024x1024`, `1024x1536`, `1536x1024`, `auto` | Cost-sensitive draft batches and lower-stakes previews |
## gpt-image-2 sizes
`gpt-image-2` accepts `auto` or any `WIDTHxHEIGHT` size that satisfies all constraints:
- Maximum edge length must be less than or equal to `3840px`.
- Both edges must be multiples of `16px`.
- Long edge to short edge ratio must not exceed `3:1`.
- Total pixels must be at least `655,360` and no more than `8,294,400`.
Popular sizes:
| Label | Size | Notes |
| --- | --- | --- |
| Square | `1024x1024` | Typical fast default |
| Landscape | `1536x1024` | Standard landscape |
| Portrait | `1024x1536` | Standard portrait |
| 2K square | `2048x2048` | Larger square output |
| 2K landscape | `2048x1152` | Widescreen output |
| 4K landscape | `3840x2160` | Widescreen 4K output |
| 4K portrait | `2160x3840` | Vertical 4K output |
| Auto | `auto` | Default size |
Square images are typically fastest to generate. For 4K-style output, use `3840x2160` or `2160x3840`.
## Endpoints
- Generate: `POST /v1/images/generations` (`client.images.generate(...)`)
- Edit: `POST /v1/images/edits` (`client.images.edit(...)`)
## Core parameters for GPT Image models
- `prompt`: text prompt
- `model`: image model
- `n`: number of images (1-10)
- `size`: `auto` by default for `gpt-image-2`; flexible `WIDTHxHEIGHT` sizes are allowed only for `gpt-image-2`; older GPT Image models use `1024x1024`, `1536x1024`, `1024x1536`, or `auto`
- `quality`: `low`, `medium`, `high`, or `auto`
- `background`: output transparency behavior (`transparent`, `opaque`, or `auto`) for generated output; this is not the same thing as the prompt's visual scene/backdrop
- `output_format`: `png` (default), `jpeg`, `webp`
- `output_compression`: 0-100 (jpeg/webp only)
- `moderation`: `auto` (default) or `low`
## Edit-specific parameters
- `image`: one or more input images. For GPT Image models, you can provide up to 16 images.
- `mask`: optional mask image
- `input_fidelity`: `low` or `high` only for models that support it; do not set this for `gpt-image-2`
Model-specific note for `input_fidelity`:
- `gpt-image-2` always uses high fidelity for image inputs and does not support setting `input_fidelity`.
- `gpt-image-1` and `gpt-image-1-mini` preserve all input images, but the first image gets richer textures and finer details.
- `gpt-image-1.5` preserves the first 5 input images with higher fidelity.
## Transparent backgrounds
`gpt-image-2` does not currently support the Image API `background=transparent` parameter. The skill's default transparent-image path is built-in `image_gen` with a flat chroma-key background, followed by local alpha extraction with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`.
Use CLI `gpt-image-1.5` with `background=transparent` and a transparent-capable output format such as `png` or `webp` only after the user explicitly confirms that fallback, unless they already requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. If the user asks for true/native transparency, the subject is too complex for clean chroma-key removal, or local background removal fails validation, explain the tradeoff and ask before switching.
## Output
- `data[]` list with `b64_json` per image
- The bundled `scripts/image_gen.py` CLI decodes `b64_json` and writes output files for you.
## Limits and notes
- Input images and masks must be under 50MB.
- Use the edits endpoint when the user requests changes to an existing image.
- Masking is prompt-guided; exact shapes are not guaranteed.
- Large sizes and high quality increase latency and cost.
- Use `quality=low` for fast drafts, thumbnails, and quick iterations. Use `medium` or `high` for final assets, dense text, diagrams, identity-sensitive edits, or high-resolution outputs.
- High `input_fidelity` can materially increase input token usage on models that support it.
- If a request fails because a specific option is unsupported by the selected GPT Image model, retry manually without that option only when the option is not required by the user. If true transparent CLI output is required, ask before switching to `gpt-image-1.5` instead of dropping `background=transparent`, unless the user already explicitly chose that fallback.
## Important boundary
- `quality`, `input_fidelity`, explicit masks, `background`, `output_format`, and related parameters are fallback-only execution controls.
- Do not assume they are built-in `image_gen` tool arguments.
@@ -0,0 +1,118 @@
# Prompting best practices
These prompting principles are shared by both top-level modes of the skill:
- built-in `image_gen` tool (default)
- explicit `scripts/image_gen.py` CLI fallback
This file is about prompt structure, specificity, and iteration. Fallback-only execution controls such as `quality`, `input_fidelity`, masks, output format, and output paths live in the fallback docs.
## Contents
- [Structure](#structure)
- [Specificity policy](#specificity-policy)
- [Allowed and disallowed augmentation](#allowed-and-disallowed-augmentation)
- [Composition and layout](#composition-and-layout)
- [Constraints and invariants](#constraints-and-invariants)
- [Text in images](#text-in-images)
- [Input images and references](#input-images-and-references)
- [Iterate deliberately](#iterate-deliberately)
- [Transparent images](#transparent-images)
- [Fallback-only execution controls](#fallback-only-execution-controls)
- [Use-case tips](#use-case-tips)
- [Where to find copy/paste recipes](#where-to-find-copypaste-recipes)
## Structure
- Use a consistent order: scene/backdrop -> subject -> key details -> constraints -> output intent.
- Include intended use (ad, UI mock, infographic) to set the level of polish.
- For complex requests, use short labeled lines instead of one long paragraph.
## Specificity policy
- If the user prompt is already specific and detailed, normalize it into a clean spec without adding creative requirements.
- If the prompt is generic, you may add tasteful detail when it materially improves the output.
- Treat examples in `sample-prompts.md` as fully-authored recipes, not as the default amount of augmentation to add to every request.
- For photorealism, include `photorealistic` directly when that is the goal, plus concrete real-world texture such as pores, wrinkles, fabric wear, material grain, or imperfect everyday detail.
## Allowed and disallowed augmentation
Allowed augmentation for generic prompts:
- composition and framing cues
- intended-use or polish-level hints
- practical layout guidance
- reasonable scene concreteness that supports the request
Do not add:
- extra characters, props, or objects that are not implied
- brand palettes, slogans, or story beats that are not implied
- arbitrary side-specific placement unless the surrounding layout supports it
## Composition and layout
- Specify framing and viewpoint (close-up, wide, top-down) and placement only when it materially helps.
- Call out negative space if the asset clearly needs room for UI or copy.
- Avoid making left/right layout decisions unless the user or surrounding layout supports them.
- For people, describe body framing, scale, gaze, and object interactions when they matter (`full body visible`, `looking down at the book`, `hands naturally gripping the handlebars`).
## Constraints and invariants
- State what must not change (`keep background unchanged`).
- For edits, say `change only X; keep Y unchanged` and repeat invariants on every iteration to reduce drift.
## Text in images
- Put literal text in quotes or ALL CAPS and specify typography (font style, size, color, placement).
- Spell uncommon words letter-by-letter if accuracy matters.
- For in-image copy, require verbatim rendering and no extra characters.
- In CLI fallback mode, use `medium` or `high` quality for small text, dense infographics, data-heavy slides, multi-font layouts, legends, axes, and footnotes.
## Input images and references
- Do not assume that every provided image is an edit target.
- Label each image by index and role (`Image 1: edit target`, `Image 2: style reference`).
- If the user provides images for style, composition, or mood guidance and does not ask to modify them, treat the request as generation with references.
- If the user asks to preserve an existing image while changing specific parts, treat the request as an edit.
- For compositing, describe how the images interact (`place the subject from Image 2 into Image 1`).
## Iterate deliberately
- Start with a clean base prompt, then make small single-change edits.
- Re-specify critical constraints when you iterate.
- Prefer one targeted follow-up at a time over rewriting the whole prompt.
## Transparent images
- Use built-in `image_gen` first for transparent-image requests. If the subject is clearly too complex for chroma-key removal, explain the fallback and ask before switching to CLI.
- Prompt for a perfectly flat solid chroma-key background, usually `#00ff00`; use `#ff00ff` when the subject is green, and avoid key colors that appear in the subject.
- Explicitly prohibit shadows, gradients, floor planes, reflections, texture, and lighting variation in the background.
- Ask for crisp edges, generous padding, and no use of the key color inside the subject.
- After generation, remove the background locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill` and validate the alpha result before shipping it.
- Use soft matte and despill for antialiased edges; hard tolerance-only removal is mainly for flat pixel-art or exact-color fixtures.
- Use CLI `gpt-image-1.5 --background transparent --output-format png` only after the user explicitly confirms the fallback, or when the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Ask first for true/native transparency requests, failed chroma-key validation, or complex transparent subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflective objects, or soft shadows.
## Fallback-only execution controls
- `quality`, `input_fidelity`, explicit masks, output format, and output paths are fallback-only execution controls.
- Do not assume they are built-in `image_gen` tool arguments.
- If the user explicitly chooses CLI fallback, see `references/cli.md` and `references/image-api.md` for those controls.
- In CLI fallback mode, `gpt-image-2` is the default. It supports `quality=low|medium|high|auto`; use `low` for fast drafts and thumbnails, and move to `medium`, `high`, or `auto` for final assets.
- `gpt-image-2` always uses high fidelity for image inputs, so do not set `input_fidelity` with that model.
- If a transparent request needs true CLI transparency, ask before using `gpt-image-1.5` unless the user already explicitly chose it. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
- If the user asks for 4K-style output with `gpt-image-2`, use `3840x2160` for landscape or `2160x3840` for portrait.
## Use-case tips
Generate:
- photorealistic-natural: Prompt as if a real photo is captured in the moment; use photography language (lens, lighting, framing); call for real texture; avoid over-stylized polish unless requested.
- product-mockup: Describe the product/packaging and materials; ensure clean silhouette and label clarity; if in-image text is needed, require verbatim rendering and specify typography.
- ui-mockup: Describe the target fidelity first (shippable mockup or low-fi wireframe), then focus on layout, hierarchy, and practical UI elements; avoid concept-art language.
- infographic-diagram: Define the audience and layout flow; label parts explicitly; require verbatim text; prefer higher quality in CLI mode for dense labels.
- logo-brand: Keep it simple and scalable; ask for a strong silhouette and balanced negative space; avoid decorative flourishes unless requested.
- ads-marketing: Write like a creative brief; include brand positioning, audience, desired vibe, scene, and exact tagline if text must appear.
- productivity-visual: Name the exact artifact (slide, chart, workflow diagram), define the canvas and hierarchy, provide real labels/data, and ask for readable typography and polished spacing.
- scientific-educational: Define audience, lesson objective, required labels, scientific constraints, arrows, and scan-friendly whitespace.
- illustration-story: Define panels or scene beats; keep each action concrete.
- stylized-concept: Specify style cues, material finish, and rendering approach (3D, painterly, clay) without inventing new story elements.
- historical-scene: State the location/date and required period accuracy; constrain clothing, props, and environment to match the era.
Edit:
- text-localization: Change only the text; preserve layout, typography, spacing, and hierarchy; no extra words or reflow unless needed.
- identity-preserve: Lock identity (face, body, pose, hair, expression); change only the specified elements; match lighting and shadows.
- precise-object-edit: Specify exactly what to remove/replace; preserve surrounding texture and lighting; keep everything else unchanged.
- lighting-weather: Change only environmental conditions (light, shadows, atmosphere, precipitation); keep geometry, framing, and subject identity.
- background-extraction: For simple opaque subjects, request a clean cutout on a perfectly flat chroma-key background; crisp silhouette; generous padding; no shadows; no halos; preserve label text exactly; no restyling. Ask before using true CLI transparency for complex subjects.
- style-transfer: Specify style cues to preserve (palette, texture, brushwork) and what must change; add `no extra elements` to prevent drift.
- compositing: Reference inputs by index; specify what moves where; match lighting, perspective, and scale; keep the base framing unchanged.
- sketch-to-render: Preserve layout, proportions, and perspective; choose materials and lighting that support the supplied sketch without adding new elements.
## Where to find copy/paste recipes
For copy/paste prompt specs (examples only), see `references/sample-prompts.md`. This file focuses on principles, specificity, and iteration patterns.
@@ -0,0 +1,433 @@
# Sample prompts (copy/paste)
These prompt recipes are shared across both top-level modes of the skill:
- built-in `image_gen` tool (default)
- `scripts/image_gen.py` CLI fallback for explicit CLI/API/model requests or user-confirmed true-transparent-output fallback requests
Use these as starting points. They are intentionally complete prompt recipes, not the default amount of augmentation to add to every user request.
When adapting a user's prompt:
- keep user-provided requirements
- only add detail according to the specificity policy in `SKILL.md`
- do not treat every example below as permission to invent extra story elements
The labeled lines are prompt scaffolding, not a closed schema. `Asset type` and `Input images` are prompt-only scaffolding; the CLI does not expose them as dedicated flags.
Execution details such as explicit CLI flags, `quality`, `input_fidelity`, masks, output formats, and local output paths depend on mode. Use the built-in tool by default, including simple transparent-image requests. For transparent images, prompt for a flat chroma-key background and remove it locally with `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py"`; only apply CLI-specific controls when the user explicitly opts into fallback mode or explicitly confirms that the transparent request should use true CLI transparency.
CLI model notes:
- `gpt-image-2` is the fallback CLI default for new workflows.
- `gpt-image-2` supports `quality` values `low`, `medium`, `high`, and `auto`.
- For 4K-style `gpt-image-2` output, use `3840x2160` or `2160x3840`.
- If transparent output needs true CLI fallback, ask before using `gpt-image-1.5` unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback. Explain that built-in chroma-key removal is the default path, but `gpt-image-2` does not support `background=transparent`.
- Do not set `input_fidelity` with `gpt-image-2`; image inputs already use high fidelity.
For prompting principles (structure, specificity, invariants, iteration), see `references/prompting.md`.
## Generate
### photorealistic-natural
```
Use case: photorealistic-natural
Primary request: candid photo of an elderly sailor on a small fishing boat adjusting a net
Scene/backdrop: coastal water with soft haze
Subject: weathered skin with wrinkles and sun texture
Style/medium: photorealistic candid photo
Composition/framing: medium close-up, eye-level
Lighting/mood: soft coastal daylight, shallow depth of field, subtle film grain
Materials/textures: real skin texture, worn fabric, salt-worn wood
Constraints: natural color balance; no heavy retouching; no glamorization; no watermark
Avoid: studio polish; staged look
```
### product-mockup
```
Use case: product-mockup
Primary request: premium product photo of a matte black shampoo bottle with a minimal label
Scene/backdrop: clean studio gradient from light gray to white
Subject: single bottle centered with subtle reflection
Style/medium: premium product photography
Composition/framing: centered, slight three-quarter angle, generous padding
Lighting/mood: softbox lighting, clean highlights, controlled shadows
Materials/textures: matte plastic, crisp label printing
Constraints: no logos or trademarks; no watermark
```
### ui-mockup
```
Use case: ui-mockup
Primary request: mobile app home screen for a local farmers market with vendors and daily specials
Asset type: mobile app screen
Style/medium: realistic product UI, not concept art
Composition/framing: clean vertical mobile layout with clear hierarchy
Constraints: practical layout, clear typography, no logos or trademarks, no watermark
```
### infographic-diagram
```
Use case: infographic-diagram
Primary request: detailed infographic of an automatic coffee machine flow
Scene/backdrop: clean, light neutral background
Subject: bean hopper -> grinder -> brew group -> boiler -> water tank -> drip tray
Style/medium: clean vector-like infographic with clear callouts and arrows
Composition/framing: vertical poster layout, top-to-bottom flow
Text (verbatim): "Bean Hopper", "Grinder", "Brew Group", "Boiler", "Water Tank", "Drip Tray"
Constraints: clear labels, strong contrast, no logos or trademarks, no watermark
```
### scientific-educational
```
Use case: scientific-educational
Primary request: biology diagram titled "Cellular Respiration at a Glance" for high school students
Scene/backdrop: clean white classroom handout background
Subject: glucose turns into energy inside a cell; include glycolysis, Krebs cycle, and electron transport chain
Style/medium: flat scientific diagram with consistent icons, arrows, and readable labels
Composition/framing: landscape slide-style layout with clear hierarchy and generous whitespace
Text (verbatim): "Cellular Respiration at a Glance", "Glucose", "Pyruvate", "ATP", "NADH", "FADH2", "CO2", "O2", "H2O"
Constraints: scientifically plausible; avoid tiny text; no extra decoration; no watermark
```
### logo-brand
```
Use case: logo-brand
Primary request: original logo for "Field & Flour", a local bakery
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: single centered logo on a plain background with generous padding
Constraints: strong silhouette, balanced negative space; original design only; no gradients unless essential; no trademarks; no watermark
```
### illustration-story
```
Use case: illustration-story
Primary request: 4-panel comic about a pet left alone at home
Scene/backdrop: cozy living room across panels
Subject: pet reacting to the owner leaving, then relaxing, then returning to a composed pose
Style/medium: comic illustration with clear panels
Composition/framing: 4 equal-sized vertical panels, readable actions per panel
Constraints: no text; no logos or trademarks; no watermark
```
### stylized-concept
```
Use case: stylized-concept
Primary request: cavernous hangar interior with tall support beams and drifting fog
Scene/backdrop: industrial hangar interior, deep scale, light haze
Subject: compact shuttle parked near the center
Style/medium: cinematic concept art, industrial realism
Composition/framing: wide-angle, low-angle
Lighting/mood: volumetric light rays cutting through fog
Constraints: no logos or trademarks; no watermark
```
### ads-marketing
```
Use case: ads-marketing
Primary request: campaign image for a streetwear brand called Thread
Subject: group of friends hanging out together in a stylish urban setting
Style/medium: polished youth streetwear campaign photography
Composition/framing: vertical ad layout with natural poses and integrated headline space
Lighting/mood: contemporary, energetic, tasteful
Text (verbatim): "Yours to Create."
Constraints: render the tagline exactly once; clean legible typography; no extra text; no watermarks; no unrelated logos
```
### productivity-visual
```
Use case: productivity-visual
Primary request: one pitch-deck slide titled "Market Opportunity"
Asset type: fundraising slide image
Style/medium: clean modern deck slide, white background, crisp sans-serif typography
Subject: TAM/SAM/SOM concentric-circle diagram plus a small growth bar chart from 2021 to 2026
Composition/framing: 16:9 landscape slide, clear data hierarchy, polished spacing
Text (verbatim): "Market Opportunity", "TAM: $42B", "SAM: $8.7B", "SOM: $340M", "AGI Research, 2024", "Internal analysis"
Constraints: readable labels, no clip art, no stock photography, no decorative clutter, no watermark
```
### historical-scene
```
Use case: historical-scene
Primary request: outdoor crowd scene in Bethel, New York on August 16, 1969
Scene/backdrop: open field with period-appropriate staging
Subject: crowd in period-accurate clothing, authentic environment
Style/medium: photorealistic photo
Composition/framing: wide shot, eye-level
Constraints: period-accurate details; no modern objects; no logos or trademarks; no watermark
```
## Asset type templates (taxonomy-aligned)
### Website assets template
```
Use case: <photorealistic-natural|stylized-concept|product-mockup|infographic-diagram|ui-mockup>
Asset type: <hero image / section illustration / blog header>
Primary request: <short description>
Scene/backdrop: <environment or abstract backdrop>
Subject: <main subject>
Style/medium: <photo/illustration/3D>
Composition/framing: <wide/centered; note usable negative space only if needed>
Lighting/mood: <soft/bright/neutral>
Color palette: <brand colors or neutral>
Constraints: <no text; no logos; no watermark; leave room for UI if needed>
```
### Website assets example: minimal hero background
```
Use case: stylized-concept
Asset type: landing page hero background
Primary request: minimal abstract background with a soft gradient and subtle texture
Style/medium: matte illustration / soft-rendered abstract background
Composition/framing: wide composition with usable negative space for page copy
Lighting/mood: gentle studio glow
Color palette: restrained neutral palette
Constraints: no text; no logos; no watermark
```
### Website assets example: feature section illustration
```
Use case: stylized-concept
Asset type: feature section illustration
Primary request: simple abstract shapes suggesting connection and flow
Scene/backdrop: subtle light-gray backdrop with faint texture
Style/medium: flat illustration; soft shadows; restrained contrast
Composition/framing: centered cluster; open margins for UI
Color palette: muted neutral palette
Constraints: no text; no logos; no watermark
```
### Website assets example: blog header image
```
Use case: photorealistic-natural
Asset type: blog header image
Primary request: overhead desk scene with notebook, pen, and coffee cup
Scene/backdrop: warm wooden tabletop
Style/medium: photorealistic photo
Composition/framing: wide crop with clean room for page copy
Lighting/mood: soft morning light
Constraints: no text; no logos; no watermark
```
### Game assets template
```
Use case: stylized-concept
Asset type: <game environment concept art / game character concept / game UI icon / tileable game texture>
Primary request: <biome/scene/character/icon/material>
Scene/backdrop: <location + set dressing> (if applicable)
Subject: <main focal element(s)>
Style/medium: <realistic/stylized>; <concept art / character render / UI icon / texture>
Composition/framing: <wide/establishing/top-down>; <camera angle>; <focal point placement>
Lighting/mood: <time of day>; <mood>; <volumetric/fog/etc>
Constraints: no logos or trademarks; no watermark
```
### Game assets example: environment concept art
```
Use case: stylized-concept
Asset type: game environment concept art
Primary request: cavernous hangar interior with tall support beams and drifting fog
Scene/backdrop: industrial hangar interior, deep scale, light haze
Subject: compact shuttle parked near the center
Style/medium: cinematic concept art, industrial realism
Composition/framing: wide-angle, low-angle
Lighting/mood: volumetric light rays cutting through fog
Constraints: no logos or trademarks; no watermark
```
### Game assets example: character concept
```
Use case: stylized-concept
Asset type: game character concept
Primary request: desert scout character with layered travel gear
Subject: long coat, satchel, practical travel clothing
Style/medium: character render; stylized realism
Composition/framing: neutral hero pose on a simple backdrop
Constraints: no logos or trademarks; no watermark
```
### Game assets example: UI icon
```
Use case: stylized-concept
Asset type: game UI icon
Primary request: round shield icon with a subtle rune pattern
Style/medium: painted game UI icon
Composition/framing: centered icon; generous padding; clear silhouette
Constraints: no text; no background scene elements; no logos or trademarks; no watermark
```
### Game assets example: tileable texture
```
Use case: stylized-concept
Asset type: tileable game texture
Primary request: worn sandstone blocks
Style/medium: seamless tileable texture; PBR-ish look
Scene/backdrop: neutral lighting reference only
Constraints: seamless edges; no obvious focal elements; no text; no logos or trademarks; no watermark
```
### Wireframe template
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: <page or flow to sketch>
Style/medium: low-fi grayscale wireframe
Composition/framing: <landscape or portrait to match expected device>
Subject: <sections in order; grid/columns; key labels>
Constraints: no color; no logos; no real photos; no watermark
```
### Wireframe example: homepage (desktop)
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: SaaS homepage layout with clear hierarchy
Style/medium: low-fi grayscale wireframe
Subject: top nav; hero with headline and CTA; three feature cards; testimonial strip; pricing preview; footer
Composition/framing: landscape desktop layout
Constraints: label major blocks; no color; no logos; no real photos; no watermark
```
### Wireframe example: pricing page
```
Use case: ui-mockup
Asset type: website wireframe
Primary request: pricing page layout with comparison table
Style/medium: low-fi grayscale wireframe
Subject: header; plan toggle; 3 pricing cards; comparison table; FAQ accordion; footer
Composition/framing: desktop or tablet layout
Constraints: label key areas; no color; no logos; no real photos; no watermark
```
### Wireframe example: mobile onboarding flow
```
Use case: ui-mockup
Asset type: mobile onboarding wireframe
Primary request: three-screen mobile onboarding flow
Style/medium: low-fi grayscale wireframe
Subject: screen 1 headline and CTA; screen 2 feature bullets; screen 3 form fields and CTA
Composition/framing: portrait mobile layout
Constraints: label screens and blocks; no color; no logos; no real photos; no watermark
```
### Logo template
```
Use case: logo-brand
Asset type: logo concept
Primary request: <brand idea or symbol concept>
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; clear silhouette; generous margin
Color palette: <1-2 colors; high contrast>
Text (verbatim): "<exact name>" (only if needed)
Constraints: no gradients; no mockups; no 3D; no watermark
```
### Logo example: abstract symbol mark
```
Use case: logo-brand
Asset type: logo concept
Primary request: geometric leaf symbol suggesting sustainability and growth
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; clear silhouette
Color palette: deep green and off-white
Constraints: no text unless requested; no gradients; no mockups; no 3D; no watermark
```
### Logo example: monogram mark
```
Use case: logo-brand
Asset type: logo concept
Primary request: interlocking monogram of the letters "AV"
Style/medium: vector logo mark; flat colors; minimal
Composition/framing: centered mark; balanced spacing
Color palette: black on white
Constraints: no gradients; no mockups; no 3D; no watermark
```
### Logo example: wordmark
```
Use case: logo-brand
Asset type: logo concept
Primary request: clean wordmark for a modern studio
Style/medium: vector wordmark; flat colors; minimal
Text (verbatim): "Studio North"
Composition/framing: centered text; even letter spacing
Constraints: no gradients; no mockups; no 3D; no watermark
```
## Edit
### text-localization
```
Use case: text-localization
Input images: Image 1: original infographic
Primary request: replace "Bean Hopper", "Grinder", "Brew Group", "Boiler", "Water Tank", and "Drip Tray" with "Tolva", "Molino", "Grupo de infusión", "Caldera", "Depósito de agua", and "Bandeja de goteo"
Constraints: change only the text; preserve layout, typography, spacing, and hierarchy; no extra words; do not alter logos or imagery
```
### identity-preserve
```
Use case: identity-preserve
Input images: Image 1: person photo; Image 2..N: clothing references
Primary request: replace only the clothing with the provided garments
Constraints: preserve face, body shape, pose, hair, expression, and identity; match lighting and shadows; keep the background unchanged; no accessories or text
```
### precise-object-edit
```
Use case: precise-object-edit
Input images: Image 1: room photo
Primary request: replace only the white chairs with wooden chairs
Constraints: preserve camera angle, room lighting, floor shadows, and surrounding objects; keep all other aspects unchanged
```
### lighting-weather
```
Use case: lighting-weather
Input images: Image 1: original photo
Primary request: make it look like a winter evening with gentle snowfall
Constraints: preserve subject identity, geometry, camera angle, and composition; change only lighting, atmosphere, and weather
```
### background-extraction
```
Use case: background-extraction
Input images: Image 1: product photo
Primary request: isolate the product on a clean transparent background
Scene/backdrop: perfectly flat solid #00ff00 chroma-key background for local background removal
Constraints: background must be one uniform color with no shadows, gradients, texture, reflections, floor plane, or lighting variation; crisp silhouette; generous padding; no halos or fringing; preserve label text exactly; no restyling; do not use #00ff00 anywhere in the subject
```
Post-process note: after built-in generation, run `python "${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/scripts/remove_chroma_key.py" --input <source> --out <final.png> --auto-key border --soft-matte --transparent-threshold 12 --opaque-threshold 220 --despill`. Ask before using CLI `gpt-image-1.5 --background transparent --output-format png` for true/native transparency, failed chroma-key validation, or complex subjects such as hair, fur, glass, smoke, liquids, translucent materials, reflections, or soft shadows, unless the user already explicitly requested `gpt-image-1.5`, `scripts/image_gen.py`, or CLI fallback.
### style-transfer
```
Use case: style-transfer
Input images: Image 1: style reference
Primary request: apply Image 1's visual style to a man riding a motorcycle on a plain white backdrop
Constraints: preserve palette, texture, and brushwork; no extra elements
```
### compositing
```
Use case: compositing
Input images: Image 1: base scene; Image 2: subject to insert
Primary request: place the subject from Image 2 next to the person in Image 1
Constraints: match lighting, perspective, and scale; keep the base framing unchanged; no extra elements
```
### character consistency workflow
```
Use case: identity-preserve
Input images: Image 1: previous character anchor illustration
Primary request: continue the story with the same character in a new scene and action
Scene/backdrop: snowy forest after a winter storm
Subject: same young forest hero gently helping a frightened squirrel out of a fallen tree
Style/medium: same children's book watercolor illustration style as Image 1
Constraints: do not redesign the character; preserve facial features, proportions, outfit, color palette, and personality; no text; no watermark
```
### sketch-to-render
```
Use case: sketch-to-render
Input images: Image 1: drawing
Primary request: turn the drawing into a photorealistic image
Constraints: preserve layout, proportions, and perspective; choose realistic materials and lighting; do not add new elements or text
```
@@ -0,0 +1,995 @@
#!/usr/bin/env python3
"""Fallback CLI for explicit image generation or editing with GPT Image models.
Used only when the user explicitly opts into CLI fallback mode, or when explicit
transparent output requires the `gpt-image-1.5` fallback path.
Defaults to gpt-image-2 and a structured prompt augmentation workflow.
"""
from __future__ import annotations
import argparse
import asyncio
import base64
import json
import os
from pathlib import Path
import re
import sys
import time
from typing import Any, Dict, Iterable, List, Optional, Tuple
from io import BytesIO
DEFAULT_MODEL = "gpt-image-2"
DEFAULT_SIZE = "auto"
DEFAULT_QUALITY = "medium"
DEFAULT_OUTPUT_FORMAT = "png"
DEFAULT_CONCURRENCY = 5
DEFAULT_DOWNSCALE_SUFFIX = "-web"
DEFAULT_OUTPUT_PATH = "output/imagegen/output.png"
GPT_IMAGE_MODEL_PREFIX = "gpt-image-"
ALLOWED_LEGACY_SIZES = {"1024x1024", "1536x1024", "1024x1536", "auto"}
ALLOWED_QUALITIES = {"low", "medium", "high", "auto"}
ALLOWED_BACKGROUNDS = {"transparent", "opaque", "auto", None}
ALLOWED_INPUT_FIDELITIES = {"low", "high", None}
GPT_IMAGE_2_MODEL = "gpt-image-2"
GPT_IMAGE_2_MIN_PIXELS = 655_360
GPT_IMAGE_2_MAX_PIXELS = 8_294_400
GPT_IMAGE_2_MAX_EDGE = 3840
GPT_IMAGE_2_MAX_RATIO = 3.0
MAX_IMAGE_BYTES = 50 * 1024 * 1024
MAX_BATCH_JOBS = 500
def _die(message: str, code: int = 1) -> None:
print(f"Error: {message}", file=sys.stderr)
raise SystemExit(code)
def _warn(message: str) -> None:
print(f"Warning: {message}", file=sys.stderr)
def _dependency_hint(package: str, *, upgrade: bool = False) -> str:
command = f"uv pip install {'-U ' if upgrade else ''}{package}"
return (
"Activate the repo-selected environment first, then install it with "
f"`{command}`. If this repo uses a local virtualenv, start with "
"`source .venv/bin/activate`; otherwise use this repo's configured shared fallback "
"environment. If your project declares dependencies, prefer that project's normal "
"`uv sync` flow."
)
def _ensure_api_key(dry_run: bool) -> None:
if os.getenv("OPENAI_API_KEY"):
print("OPENAI_API_KEY is set.", file=sys.stderr)
return
if dry_run:
_warn("OPENAI_API_KEY is not set; dry-run only.")
return
_die("OPENAI_API_KEY is not set. Export it before running.")
def _read_prompt(prompt: Optional[str], prompt_file: Optional[str]) -> str:
if prompt and prompt_file:
_die("Use --prompt or --prompt-file, not both.")
if prompt_file:
path = Path(prompt_file)
if not path.exists():
_die(f"Prompt file not found: {path}")
return path.read_text(encoding="utf-8").strip()
if prompt:
return prompt.strip()
_die("Missing prompt. Use --prompt or --prompt-file.")
return "" # unreachable
def _check_image_paths(paths: Iterable[str]) -> List[Path]:
resolved: List[Path] = []
for raw in paths:
path = Path(raw)
if not path.exists():
_die(f"Image file not found: {path}")
if path.stat().st_size > MAX_IMAGE_BYTES:
_warn(f"Image exceeds 50MB limit: {path}")
resolved.append(path)
return resolved
def _normalize_output_format(fmt: Optional[str]) -> str:
if not fmt:
return DEFAULT_OUTPUT_FORMAT
fmt = fmt.lower()
if fmt not in {"png", "jpeg", "jpg", "webp"}:
_die("output-format must be png, jpeg, jpg, or webp.")
return "jpeg" if fmt == "jpg" else fmt
def _parse_size(size: str) -> Optional[Tuple[int, int]]:
match = re.fullmatch(r"([1-9][0-9]*)x([1-9][0-9]*)", size)
if not match:
return None
return int(match.group(1)), int(match.group(2))
def _validate_gpt_image_2_size(size: str) -> None:
if size == "auto":
return
parsed = _parse_size(size)
if parsed is None:
_die("size must be auto or WIDTHxHEIGHT, for example 1024x1024.")
width, height = parsed
max_edge = max(width, height)
min_edge = min(width, height)
total_pixels = width * height
if max_edge > GPT_IMAGE_2_MAX_EDGE:
_die("gpt-image-2 size maximum edge length must be less than or equal to 3840px.")
if width % 16 != 0 or height % 16 != 0:
_die("gpt-image-2 size width and height must be multiples of 16px.")
if max_edge / min_edge > GPT_IMAGE_2_MAX_RATIO:
_die("gpt-image-2 size long edge to short edge ratio must not exceed 3:1.")
if total_pixels < GPT_IMAGE_2_MIN_PIXELS or total_pixels > GPT_IMAGE_2_MAX_PIXELS:
_die(
"gpt-image-2 size total pixels must be at least 655,360 and no more than 8,294,400."
)
def _validate_size(size: str, model: str) -> None:
if model == GPT_IMAGE_2_MODEL:
_validate_gpt_image_2_size(size)
return
if size not in ALLOWED_LEGACY_SIZES:
_die(
"size must be one of 1024x1024, 1536x1024, 1024x1536, or auto for this GPT Image model."
)
def _validate_quality(quality: str) -> None:
if quality not in ALLOWED_QUALITIES:
_die("quality must be one of low, medium, high, or auto.")
def _validate_background(background: Optional[str]) -> None:
if background not in ALLOWED_BACKGROUNDS:
_die("background must be one of transparent, opaque, or auto.")
def _validate_input_fidelity(input_fidelity: Optional[str]) -> None:
if input_fidelity not in ALLOWED_INPUT_FIDELITIES:
_die("input-fidelity must be one of low or high.")
def _validate_model(model: str) -> None:
if not model.startswith(GPT_IMAGE_MODEL_PREFIX):
_die(
"model must be a GPT Image model (for example gpt-image-1.5, gpt-image-1, or gpt-image-1-mini)."
)
def _validate_transparency(background: Optional[str], output_format: str) -> None:
if background == "transparent" and output_format not in {"png", "webp"}:
_die("transparent background requires output-format png or webp.")
def _validate_model_specific_options(
*,
model: str,
background: Optional[str],
input_fidelity: Optional[str] = None,
) -> None:
if model != GPT_IMAGE_2_MODEL:
return
if background == "transparent":
_die(
"transparent backgrounds are not supported in gpt-image-2, the latest model. "
"Use --model gpt-image-1.5 --background transparent --output-format png instead."
)
if input_fidelity is not None:
_die(
"input_fidelity is not supported in gpt-image-2 because image inputs always use high fidelity for this model."
)
def _validate_generate_payload(payload: Dict[str, Any]) -> None:
model = str(payload.get("model", DEFAULT_MODEL))
_validate_model(model)
n = int(payload.get("n", 1))
if n < 1 or n > 10:
_die("n must be between 1 and 10")
size = str(payload.get("size", DEFAULT_SIZE))
quality = str(payload.get("quality", DEFAULT_QUALITY))
background = payload.get("background")
_validate_size(size, model)
_validate_quality(quality)
_validate_background(background)
_validate_model_specific_options(model=model, background=background)
oc = payload.get("output_compression")
if oc is not None and not (0 <= int(oc) <= 100):
_die("output_compression must be between 0 and 100")
def _build_output_paths(
out: str,
output_format: str,
count: int,
out_dir: Optional[str],
) -> List[Path]:
ext = "." + output_format
if out_dir:
out_base = Path(out_dir)
out_base.mkdir(parents=True, exist_ok=True)
return [out_base / f"image_{i}{ext}" for i in range(1, count + 1)]
out_path = Path(out)
if out_path.exists() and out_path.is_dir():
out_path.mkdir(parents=True, exist_ok=True)
return [out_path / f"image_{i}{ext}" for i in range(1, count + 1)]
if out_path.suffix == "":
out_path = out_path.with_suffix(ext)
elif output_format and out_path.suffix.lstrip(".").lower() != output_format:
_warn(
f"Output extension {out_path.suffix} does not match output-format {output_format}."
)
if count == 1:
return [out_path]
return [
out_path.with_name(f"{out_path.stem}-{i}{out_path.suffix}")
for i in range(1, count + 1)
]
def _augment_prompt(args: argparse.Namespace, prompt: str) -> str:
fields = _fields_from_args(args)
return _augment_prompt_fields(args.augment, prompt, fields)
def _augment_prompt_fields(augment: bool, prompt: str, fields: Dict[str, Optional[str]]) -> str:
if not augment:
return prompt
sections: List[str] = []
if fields.get("use_case"):
sections.append(f"Use case: {fields['use_case']}")
sections.append(f"Primary request: {prompt}")
if fields.get("scene"):
sections.append(f"Scene/background: {fields['scene']}")
if fields.get("subject"):
sections.append(f"Subject: {fields['subject']}")
if fields.get("style"):
sections.append(f"Style/medium: {fields['style']}")
if fields.get("composition"):
sections.append(f"Composition/framing: {fields['composition']}")
if fields.get("lighting"):
sections.append(f"Lighting/mood: {fields['lighting']}")
if fields.get("palette"):
sections.append(f"Color palette: {fields['palette']}")
if fields.get("materials"):
sections.append(f"Materials/textures: {fields['materials']}")
if fields.get("text"):
sections.append(f"Text (verbatim): \"{fields['text']}\"")
if fields.get("constraints"):
sections.append(f"Constraints: {fields['constraints']}")
if fields.get("negative"):
sections.append(f"Avoid: {fields['negative']}")
return "\n".join(sections)
def _fields_from_args(args: argparse.Namespace) -> Dict[str, Optional[str]]:
return {
"use_case": getattr(args, "use_case", None),
"scene": getattr(args, "scene", None),
"subject": getattr(args, "subject", None),
"style": getattr(args, "style", None),
"composition": getattr(args, "composition", None),
"lighting": getattr(args, "lighting", None),
"palette": getattr(args, "palette", None),
"materials": getattr(args, "materials", None),
"text": getattr(args, "text", None),
"constraints": getattr(args, "constraints", None),
"negative": getattr(args, "negative", None),
}
def _print_request(payload: dict) -> None:
print(json.dumps(payload, indent=2, sort_keys=True))
def _decode_and_write(images: List[str], outputs: List[Path], force: bool) -> None:
for idx, image_b64 in enumerate(images):
if idx >= len(outputs):
break
out_path = outputs[idx]
if out_path.exists() and not force:
_die(f"Output already exists: {out_path} (use --force to overwrite)")
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_bytes(base64.b64decode(image_b64))
print(f"Wrote {out_path}")
def _derive_downscale_path(path: Path, suffix: str) -> Path:
if suffix and not suffix.startswith("-") and not suffix.startswith("_"):
suffix = "-" + suffix
return path.with_name(f"{path.stem}{suffix}{path.suffix}")
def _downscale_image_bytes(image_bytes: bytes, *, max_dim: int, output_format: str) -> bytes:
try:
from PIL import Image
except Exception:
_die(f"Downscaling requires Pillow. {_dependency_hint('pillow')}")
if max_dim < 1:
_die("--downscale-max-dim must be >= 1")
with Image.open(BytesIO(image_bytes)) as img:
img.load()
w, h = img.size
scale = min(1.0, float(max_dim) / float(max(w, h)))
target = (max(1, int(round(w * scale))), max(1, int(round(h * scale))))
resized = img if target == (w, h) else img.resize(target, Image.Resampling.LANCZOS)
fmt = output_format.lower()
if fmt == "jpg":
fmt = "jpeg"
if fmt == "jpeg":
if resized.mode in ("RGBA", "LA") or ("transparency" in getattr(resized, "info", {})):
bg = Image.new("RGB", resized.size, (255, 255, 255))
bg.paste(resized.convert("RGBA"), mask=resized.convert("RGBA").split()[-1])
resized = bg
else:
resized = resized.convert("RGB")
out = BytesIO()
resized.save(out, format=fmt.upper())
return out.getvalue()
def _decode_write_and_downscale(
images: List[str],
outputs: List[Path],
*,
force: bool,
downscale_max_dim: Optional[int],
downscale_suffix: str,
output_format: str,
) -> None:
for idx, image_b64 in enumerate(images):
if idx >= len(outputs):
break
out_path = outputs[idx]
if out_path.exists() and not force:
_die(f"Output already exists: {out_path} (use --force to overwrite)")
out_path.parent.mkdir(parents=True, exist_ok=True)
raw = base64.b64decode(image_b64)
out_path.write_bytes(raw)
print(f"Wrote {out_path}")
if downscale_max_dim is None:
continue
derived = _derive_downscale_path(out_path, downscale_suffix)
if derived.exists() and not force:
_die(f"Output already exists: {derived} (use --force to overwrite)")
derived.parent.mkdir(parents=True, exist_ok=True)
resized = _downscale_image_bytes(raw, max_dim=downscale_max_dim, output_format=output_format)
derived.write_bytes(resized)
print(f"Wrote {derived}")
def _create_client():
try:
from openai import OpenAI
except ImportError:
_die(f"openai SDK not installed in the active environment. {_dependency_hint('openai')}")
return OpenAI()
def _create_async_client():
try:
from openai import AsyncOpenAI
except ImportError:
try:
import openai as _openai # noqa: F401
except ImportError:
_die(
f"openai SDK not installed in the active environment. {_dependency_hint('openai')}"
)
_die(
"AsyncOpenAI not available in this openai SDK version. "
f"{_dependency_hint('openai', upgrade=True)}"
)
return AsyncOpenAI()
def _slugify(value: str) -> str:
value = value.strip().lower()
value = re.sub(r"[^a-z0-9]+", "-", value)
value = re.sub(r"-{2,}", "-", value).strip("-")
return value[:60] if value else "job"
def _normalize_job(job: Any, idx: int) -> Dict[str, Any]:
if isinstance(job, str):
prompt = job.strip()
if not prompt:
_die(f"Empty prompt at job {idx}")
return {"prompt": prompt}
if isinstance(job, dict):
if "prompt" not in job or not str(job["prompt"]).strip():
_die(f"Missing prompt for job {idx}")
return job
_die(f"Invalid job at index {idx}: expected string or object.")
return {} # unreachable
def _read_jobs_jsonl(path: str) -> List[Dict[str, Any]]:
p = Path(path)
if not p.exists():
_die(f"Input file not found: {p}")
jobs: List[Dict[str, Any]] = []
for line_no, raw in enumerate(p.read_text(encoding="utf-8").splitlines(), start=1):
line = raw.strip()
if not line or line.startswith("#"):
continue
try:
item: Any
if line.startswith("{"):
item = json.loads(line)
else:
item = line
jobs.append(_normalize_job(item, idx=line_no))
except json.JSONDecodeError as exc:
_die(f"Invalid JSON on line {line_no}: {exc}")
if not jobs:
_die("No jobs found in input file.")
if len(jobs) > MAX_BATCH_JOBS:
_die(f"Too many jobs ({len(jobs)}). Max is {MAX_BATCH_JOBS}.")
return jobs
def _merge_non_null(dst: Dict[str, Any], src: Dict[str, Any]) -> Dict[str, Any]:
merged = dict(dst)
for k, v in src.items():
if v is not None:
merged[k] = v
return merged
def _job_output_paths(
*,
out_dir: Path,
output_format: str,
idx: int,
prompt: str,
n: int,
explicit_out: Optional[str],
) -> List[Path]:
out_dir.mkdir(parents=True, exist_ok=True)
ext = "." + output_format
if explicit_out:
base = Path(explicit_out)
if base.suffix == "":
base = base.with_suffix(ext)
elif base.suffix.lstrip(".").lower() != output_format:
_warn(
f"Job {idx}: output extension {base.suffix} does not match output-format {output_format}."
)
base = out_dir / base.name
else:
slug = _slugify(prompt[:80])
base = out_dir / f"{idx:03d}-{slug}{ext}"
if n == 1:
return [base]
return [
base.with_name(f"{base.stem}-{i}{base.suffix}")
for i in range(1, n + 1)
]
def _extract_retry_after_seconds(exc: Exception) -> Optional[float]:
# Best-effort: openai SDK errors vary by version. Prefer a conservative fallback.
for attr in ("retry_after", "retry_after_seconds"):
val = getattr(exc, attr, None)
if isinstance(val, (int, float)) and val >= 0:
return float(val)
msg = str(exc)
m = re.search(r"retry[- ]after[:= ]+([0-9]+(?:\\.[0-9]+)?)", msg, re.IGNORECASE)
if m:
try:
return float(m.group(1))
except Exception:
return None
return None
def _is_rate_limit_error(exc: Exception) -> bool:
name = exc.__class__.__name__.lower()
if "ratelimit" in name or "rate_limit" in name:
return True
msg = str(exc).lower()
return "429" in msg or "rate limit" in msg or "too many requests" in msg
def _is_transient_error(exc: Exception) -> bool:
if _is_rate_limit_error(exc):
return True
name = exc.__class__.__name__.lower()
if "timeout" in name or "timedout" in name or "tempor" in name:
return True
msg = str(exc).lower()
return "timeout" in msg or "timed out" in msg or "connection reset" in msg
async def _generate_one_with_retries(
client: Any,
payload: Dict[str, Any],
*,
attempts: int,
job_label: str,
) -> Any:
last_exc: Optional[Exception] = None
for attempt in range(1, attempts + 1):
try:
return await client.images.generate(**payload)
except Exception as exc:
last_exc = exc
if not _is_transient_error(exc):
raise
if attempt == attempts:
raise
sleep_s = _extract_retry_after_seconds(exc)
if sleep_s is None:
sleep_s = min(60.0, 2.0**attempt)
print(
f"{job_label} attempt {attempt}/{attempts} failed ({exc.__class__.__name__}); retrying in {sleep_s:.1f}s",
file=sys.stderr,
)
await asyncio.sleep(sleep_s)
raise last_exc or RuntimeError("unknown error")
async def _run_generate_batch(args: argparse.Namespace) -> int:
jobs = _read_jobs_jsonl(args.input)
out_dir = Path(args.out_dir)
base_fields = _fields_from_args(args)
base_payload = {
"model": args.model,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"moderation": args.moderation,
}
if args.dry_run:
for i, job in enumerate(jobs, start=1):
prompt = str(job["prompt"]).strip()
fields = _merge_non_null(base_fields, job.get("fields", {}))
# Allow flat job keys as well (use_case, scene, etc.)
fields = _merge_non_null(fields, {k: job.get(k) for k in base_fields.keys()})
augmented = _augment_prompt_fields(args.augment, prompt, fields)
job_payload = dict(base_payload)
job_payload["prompt"] = augmented
job_payload = _merge_non_null(job_payload, {k: job.get(k) for k in base_payload.keys()})
job_payload = {k: v for k, v in job_payload.items() if v is not None}
_validate_generate_payload(job_payload)
effective_output_format = _normalize_output_format(job_payload.get("output_format"))
_validate_transparency(job_payload.get("background"), effective_output_format)
job_payload["output_format"] = effective_output_format
n = int(job_payload.get("n", 1))
outputs = _job_output_paths(
out_dir=out_dir,
output_format=effective_output_format,
idx=i,
prompt=prompt,
n=n,
explicit_out=job.get("out"),
)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [
str(_derive_downscale_path(p, args.downscale_suffix)) for p in outputs
]
_print_request(
{
"endpoint": "/v1/images/generations",
"job": i,
"outputs": [str(p) for p in outputs],
"outputs_downscaled": downscaled,
**job_payload,
}
)
return 0
client = _create_async_client()
sem = asyncio.Semaphore(args.concurrency)
any_failed = False
async def run_job(i: int, job: Dict[str, Any]) -> Tuple[int, Optional[str]]:
nonlocal any_failed
prompt = str(job["prompt"]).strip()
job_label = f"[job {i}/{len(jobs)}]"
fields = _merge_non_null(base_fields, job.get("fields", {}))
fields = _merge_non_null(fields, {k: job.get(k) for k in base_fields.keys()})
augmented = _augment_prompt_fields(args.augment, prompt, fields)
payload = dict(base_payload)
payload["prompt"] = augmented
payload = _merge_non_null(payload, {k: job.get(k) for k in base_payload.keys()})
payload = {k: v for k, v in payload.items() if v is not None}
n = int(payload.get("n", 1))
_validate_generate_payload(payload)
effective_output_format = _normalize_output_format(payload.get("output_format"))
_validate_transparency(payload.get("background"), effective_output_format)
payload["output_format"] = effective_output_format
outputs = _job_output_paths(
out_dir=out_dir,
output_format=effective_output_format,
idx=i,
prompt=prompt,
n=n,
explicit_out=job.get("out"),
)
try:
async with sem:
print(f"{job_label} starting", file=sys.stderr)
started = time.time()
result = await _generate_one_with_retries(
client,
payload,
attempts=args.max_attempts,
job_label=job_label,
)
elapsed = time.time() - started
print(f"{job_label} completed in {elapsed:.1f}s", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
outputs,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=effective_output_format,
)
return i, None
except Exception as exc:
any_failed = True
print(f"{job_label} failed: {exc}", file=sys.stderr)
if args.fail_fast:
raise
return i, str(exc)
tasks = [asyncio.create_task(run_job(i, job)) for i, job in enumerate(jobs, start=1)]
try:
await asyncio.gather(*tasks)
except Exception:
for t in tasks:
if not t.done():
t.cancel()
raise
return 1 if any_failed else 0
def _generate_batch(args: argparse.Namespace) -> None:
exit_code = asyncio.run(_run_generate_batch(args))
if exit_code:
raise SystemExit(exit_code)
def _generate(args: argparse.Namespace) -> None:
prompt = _read_prompt(args.prompt, args.prompt_file)
prompt = _augment_prompt(args, prompt)
payload = {
"model": args.model,
"prompt": prompt,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"moderation": args.moderation,
}
payload = {k: v for k, v in payload.items() if v is not None}
output_format = _normalize_output_format(args.output_format)
_validate_transparency(args.background, output_format)
payload["output_format"] = output_format
output_paths = _build_output_paths(args.out, output_format, args.n, args.out_dir)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [str(_derive_downscale_path(p, args.downscale_suffix)) for p in output_paths]
if args.dry_run:
_print_request(
{
"endpoint": "/v1/images/generations",
"outputs": [str(p) for p in output_paths],
"outputs_downscaled": downscaled,
**payload,
}
)
return
print(
"Calling Image API (generation). This can take up to a couple of minutes.",
file=sys.stderr,
)
started = time.time()
client = _create_client()
result = client.images.generate(**payload)
elapsed = time.time() - started
print(f"Generation completed in {elapsed:.1f}s.", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
output_paths,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=output_format,
)
def _edit(args: argparse.Namespace) -> None:
prompt = _read_prompt(args.prompt, args.prompt_file)
prompt = _augment_prompt(args, prompt)
image_paths = _check_image_paths(args.image)
mask_path = Path(args.mask) if args.mask else None
if mask_path:
if not mask_path.exists():
_die(f"Mask file not found: {mask_path}")
if mask_path.suffix.lower() != ".png":
_warn(f"Mask should be a PNG with an alpha channel: {mask_path}")
if mask_path.stat().st_size > MAX_IMAGE_BYTES:
_warn(f"Mask exceeds 50MB limit: {mask_path}")
payload = {
"model": args.model,
"prompt": prompt,
"n": args.n,
"size": args.size,
"quality": args.quality,
"background": args.background,
"output_format": args.output_format,
"output_compression": args.output_compression,
"input_fidelity": args.input_fidelity,
"moderation": args.moderation,
}
payload = {k: v for k, v in payload.items() if v is not None}
output_format = _normalize_output_format(args.output_format)
_validate_transparency(args.background, output_format)
payload["output_format"] = output_format
_validate_input_fidelity(args.input_fidelity)
output_paths = _build_output_paths(args.out, output_format, args.n, args.out_dir)
downscaled = None
if args.downscale_max_dim is not None:
downscaled = [str(_derive_downscale_path(p, args.downscale_suffix)) for p in output_paths]
if args.dry_run:
payload_preview = dict(payload)
payload_preview["image"] = [str(p) for p in image_paths]
if mask_path:
payload_preview["mask"] = str(mask_path)
_print_request(
{
"endpoint": "/v1/images/edits",
"outputs": [str(p) for p in output_paths],
"outputs_downscaled": downscaled,
**payload_preview,
}
)
return
print(
f"Calling Image API (edit) with {len(image_paths)} image(s).",
file=sys.stderr,
)
started = time.time()
client = _create_client()
with _open_files(image_paths) as image_files, _open_mask(mask_path) as mask_file:
request = dict(payload)
request["image"] = image_files if len(image_files) > 1 else image_files[0]
if mask_file is not None:
request["mask"] = mask_file
result = client.images.edit(**request)
elapsed = time.time() - started
print(f"Edit completed in {elapsed:.1f}s.", file=sys.stderr)
images = [item.b64_json for item in result.data]
_decode_write_and_downscale(
images,
output_paths,
force=args.force,
downscale_max_dim=args.downscale_max_dim,
downscale_suffix=args.downscale_suffix,
output_format=output_format,
)
def _open_files(paths: List[Path]):
return _FileBundle(paths)
def _open_mask(mask_path: Optional[Path]):
if mask_path is None:
return _NullContext()
return _SingleFile(mask_path)
class _NullContext:
def __enter__(self):
return None
def __exit__(self, exc_type, exc, tb):
return False
class _SingleFile:
def __init__(self, path: Path):
self._path = path
self._handle = None
def __enter__(self):
self._handle = self._path.open("rb")
return self._handle
def __exit__(self, exc_type, exc, tb):
if self._handle:
try:
self._handle.close()
except Exception:
pass
return False
class _FileBundle:
def __init__(self, paths: List[Path]):
self._paths = paths
self._handles: List[object] = []
def __enter__(self):
self._handles = [p.open("rb") for p in self._paths]
return self._handles
def __exit__(self, exc_type, exc, tb):
for handle in self._handles:
try:
handle.close()
except Exception:
pass
return False
def _add_shared_args(parser: argparse.ArgumentParser) -> None:
parser.add_argument("--model", default=DEFAULT_MODEL)
parser.add_argument("--prompt")
parser.add_argument("--prompt-file")
parser.add_argument("--n", type=int, default=1)
parser.add_argument("--size", default=DEFAULT_SIZE)
parser.add_argument("--quality", default=DEFAULT_QUALITY)
parser.add_argument("--background")
parser.add_argument("--output-format")
parser.add_argument("--output-compression", type=int)
parser.add_argument("--moderation")
parser.add_argument("--out", default=DEFAULT_OUTPUT_PATH)
parser.add_argument("--out-dir")
parser.add_argument("--force", action="store_true")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--augment", dest="augment", action="store_true")
parser.add_argument("--no-augment", dest="augment", action="store_false")
parser.set_defaults(augment=True)
# Prompt augmentation hints
parser.add_argument("--use-case")
parser.add_argument("--scene")
parser.add_argument("--subject")
parser.add_argument("--style")
parser.add_argument("--composition")
parser.add_argument("--lighting")
parser.add_argument("--palette")
parser.add_argument("--materials")
parser.add_argument("--text")
parser.add_argument("--constraints")
parser.add_argument("--negative")
# Post-processing (optional): generate an additional downscaled copy for fast web loading.
parser.add_argument("--downscale-max-dim", type=int)
parser.add_argument("--downscale-suffix", default=DEFAULT_DOWNSCALE_SUFFIX)
def main() -> int:
parser = argparse.ArgumentParser(
description="Fallback CLI for explicit image generation or editing via GPT Image models"
)
subparsers = parser.add_subparsers(dest="command", required=True)
gen_parser = subparsers.add_parser("generate", help="Create a new image")
_add_shared_args(gen_parser)
gen_parser.set_defaults(func=_generate)
batch_parser = subparsers.add_parser(
"generate-batch",
help="Generate multiple prompts concurrently (JSONL input)",
)
_add_shared_args(batch_parser)
batch_parser.add_argument("--input", required=True, help="Path to JSONL file (one job per line)")
batch_parser.add_argument("--concurrency", type=int, default=DEFAULT_CONCURRENCY)
batch_parser.add_argument("--max-attempts", type=int, default=3)
batch_parser.add_argument("--fail-fast", action="store_true")
batch_parser.set_defaults(func=_generate_batch)
edit_parser = subparsers.add_parser("edit", help="Edit an existing image")
_add_shared_args(edit_parser)
edit_parser.add_argument("--image", action="append", required=True)
edit_parser.add_argument("--mask")
edit_parser.add_argument("--input-fidelity")
edit_parser.set_defaults(func=_edit)
args = parser.parse_args()
if args.n < 1 or args.n > 10:
_die("--n must be between 1 and 10")
if getattr(args, "concurrency", 1) < 1 or getattr(args, "concurrency", 1) > 25:
_die("--concurrency must be between 1 and 25")
if getattr(args, "max_attempts", 3) < 1 or getattr(args, "max_attempts", 3) > 10:
_die("--max-attempts must be between 1 and 10")
if args.output_compression is not None and not (0 <= args.output_compression <= 100):
_die("--output-compression must be between 0 and 100")
if args.command == "generate-batch" and not args.out_dir:
_die("generate-batch requires --out-dir")
if getattr(args, "downscale_max_dim", None) is not None and args.downscale_max_dim < 1:
_die("--downscale-max-dim must be >= 1")
_validate_model(args.model)
_validate_size(args.size, args.model)
_validate_quality(args.quality)
_validate_background(args.background)
_validate_model_specific_options(
model=args.model,
background=args.background,
input_fidelity=getattr(args, "input_fidelity", None),
)
_ensure_api_key(args.dry_run)
args.func(args)
return 0
if __name__ == "__main__":
raise SystemExit(main())
@@ -0,0 +1,440 @@
#!/usr/bin/env python3
"""Remove a solid chroma-key background from an image.
This helper supports the imagegen skill's built-in-first transparent workflow:
generate an image on a flat key color, then convert that key color to alpha.
"""
from __future__ import annotations
import argparse
from io import BytesIO
from pathlib import Path
import re
from statistics import median
import sys
from typing import Tuple
Color = Tuple[int, int, int]
KEY_DOMINANCE_THRESHOLD = 16.0
ALPHA_NOISE_FLOOR = 8
def _die(message: str, code: int = 1) -> None:
print(f"Error: {message}", file=sys.stderr)
raise SystemExit(code)
def _dependency_hint(package: str) -> str:
return (
"Activate the repo-selected environment first, then install it with "
f"`uv pip install {package}`. If this repo uses a local virtualenv, start with "
"`source .venv/bin/activate`; otherwise use this repo's configured shared fallback "
"environment."
)
def _load_pillow():
try:
from PIL import Image, ImageFilter
except ImportError:
_die(f"Pillow is required for chroma-key removal. {_dependency_hint('pillow')}")
return Image, ImageFilter
def _parse_key_color(raw: str) -> Color:
value = raw.strip()
match = re.fullmatch(r"#?([0-9a-fA-F]{6})", value)
if not match:
_die("key color must be a hex RGB value like #00ff00.")
hex_value = match.group(1)
return (
int(hex_value[0:2], 16),
int(hex_value[2:4], 16),
int(hex_value[4:6], 16),
)
def _validate_args(args: argparse.Namespace) -> None:
if args.tolerance < 0 or args.tolerance > 255:
_die("--tolerance must be between 0 and 255.")
if args.transparent_threshold < 0 or args.transparent_threshold > 255:
_die("--transparent-threshold must be between 0 and 255.")
if args.opaque_threshold < 0 or args.opaque_threshold > 255:
_die("--opaque-threshold must be between 0 and 255.")
if args.soft_matte and args.transparent_threshold >= args.opaque_threshold:
_die("--transparent-threshold must be lower than --opaque-threshold.")
if args.edge_feather < 0 or args.edge_feather > 64:
_die("--edge-feather must be between 0 and 64.")
if args.edge_contract < 0 or args.edge_contract > 16:
_die("--edge-contract must be between 0 and 16.")
src = Path(args.input)
if not src.exists():
_die(f"Input image not found: {src}")
out = Path(args.out)
if out.exists() and not args.force:
_die(f"Output already exists: {out} (use --force to overwrite)")
if out.suffix.lower() not in {".png", ".webp"}:
_die("--out must end in .png or .webp so the alpha channel is preserved.")
def _channel_distance(a: Color, b: Color) -> int:
return max(abs(a[0] - b[0]), abs(a[1] - b[1]), abs(a[2] - b[2]))
def _clamp_channel(value: float) -> int:
return max(0, min(255, int(round(value))))
def _smoothstep(value: float) -> float:
value = max(0.0, min(1.0, value))
return value * value * (3.0 - 2.0 * value)
def _soft_alpha(distance: int, transparent_threshold: float, opaque_threshold: float) -> int:
if distance <= transparent_threshold:
return 0
if distance >= opaque_threshold:
return 255
ratio = (float(distance) - transparent_threshold) / (
opaque_threshold - transparent_threshold
)
return _clamp_channel(255.0 * _smoothstep(ratio))
def _dominance_alpha(rgb: Color, key: Color) -> int:
spill_channels = _spill_channels(key)
if not spill_channels:
return 255
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
key_strength = (
min(channels[idx] for idx in spill_channels)
if len(spill_channels) > 1
else channels[spill_channels[0]]
)
non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
dominance = key_strength - non_key_strength
if dominance <= 0:
return 255
denominator = max(1.0, float(max(key)) - non_key_strength)
alpha = 1.0 - min(1.0, dominance / denominator)
return _clamp_channel(alpha * 255.0)
def _spill_channels(key: Color) -> list[int]:
key_max = max(key)
if key_max < 128:
return []
return [idx for idx, value in enumerate(key) if value >= key_max - 16 and value >= 128]
def _key_channel_dominance(rgb: Color, key: Color) -> float:
spill_channels = _spill_channels(key)
if not spill_channels:
return 0.0
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
key_strength = (
min(channels[idx] for idx in spill_channels)
if len(spill_channels) > 1
else channels[spill_channels[0]]
)
non_key_strength = max((channels[idx] for idx in non_spill), default=0.0)
return key_strength - non_key_strength
def _looks_key_colored(rgb: Color, key: Color, distance: int) -> bool:
if distance <= 32:
return True
spill_channels = _spill_channels(key)
if not spill_channels:
return True
return _key_channel_dominance(rgb, key) >= KEY_DOMINANCE_THRESHOLD
def _cleanup_spill(rgb: Color, key: Color, alpha: int = 255) -> Color:
if alpha >= 252:
return rgb
spill_channels = _spill_channels(key)
if not spill_channels:
return rgb
channels = [float(value) for value in rgb]
non_spill = [idx for idx in range(3) if idx not in spill_channels]
if non_spill:
anchor = max(channels[idx] for idx in non_spill)
cap = max(0.0, anchor - 1.0)
for idx in spill_channels:
if channels[idx] > cap:
channels[idx] = cap
return (
_clamp_channel(channels[0]),
_clamp_channel(channels[1]),
_clamp_channel(channels[2]),
)
def _apply_alpha_to_image(
image,
*,
key: Color,
tolerance: int,
spill_cleanup: bool,
soft_matte: bool,
transparent_threshold: float,
opaque_threshold: float,
) -> int:
pixels = image.load()
width, height = image.size
transparent = 0
for y in range(height):
for x in range(width):
red, green, blue, alpha = pixels[x, y]
rgb = (red, green, blue)
distance = _channel_distance(rgb, key)
key_like = _looks_key_colored(rgb, key, distance)
output_alpha = (
min(
_soft_alpha(distance, transparent_threshold, opaque_threshold),
_dominance_alpha(rgb, key),
)
if soft_matte and key_like
else (0 if distance <= tolerance else 255)
)
output_alpha = int(round(output_alpha * (alpha / 255.0)))
if 0 < output_alpha <= ALPHA_NOISE_FLOOR:
output_alpha = 0
if output_alpha == 0:
pixels[x, y] = (0, 0, 0, 0)
transparent += 1
continue
if spill_cleanup and key_like:
red, green, blue = _cleanup_spill(rgb, key, output_alpha)
pixels[x, y] = (red, green, blue, output_alpha)
return transparent
def _contract_alpha(image, pixels: int):
if pixels == 0:
return image
_, ImageFilter = _load_pillow()
alpha = image.getchannel("A")
for _ in range(pixels):
alpha = alpha.filter(ImageFilter.MinFilter(3))
image.putalpha(alpha)
return image
def _apply_edge_feather(image, radius: float):
if radius == 0:
return image
_, ImageFilter = _load_pillow()
alpha = image.getchannel("A")
alpha = alpha.filter(ImageFilter.GaussianBlur(radius=radius))
image.putalpha(alpha)
return image
def _encode_image(image, output_format: str) -> bytes:
out = BytesIO()
image.save(out, format=output_format.upper())
return out.getvalue()
def _alpha_counts(image) -> tuple[int, int, int]:
pixels = image.load()
width, height = image.size
total = 0
transparent = 0
partial = 0
for y in range(height):
for x in range(width):
alpha = pixels[x, y][3]
total += 1
if alpha == 0:
transparent += 1
elif alpha < 255:
partial += 1
return total, transparent, partial
def _sample_border_key(image, mode: str) -> Color:
width, height = image.size
pixels = image.load()
samples: list[Color] = []
if mode == "corners":
patch = max(1, min(width, height, 12))
boxes = [
(0, 0, patch, patch),
(width - patch, 0, width, patch),
(0, height - patch, patch, height),
(width - patch, height - patch, width, height),
]
for left, top, right, bottom in boxes:
for y in range(top, bottom):
for x in range(left, right):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
else:
band = max(1, min(width, height, 6))
step = max(1, min(width, height) // 256)
for x in range(0, width, step):
for y in range(band):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
red, green, blue = pixels[x, height - 1 - y][:3]
samples.append((red, green, blue))
for y in range(0, height, step):
for x in range(band):
red, green, blue = pixels[x, y][:3]
samples.append((red, green, blue))
red, green, blue = pixels[width - 1 - x, y][:3]
samples.append((red, green, blue))
if not samples:
_die("Could not sample background key color from image border.")
return (
int(round(median(sample[0] for sample in samples))),
int(round(median(sample[1] for sample in samples))),
int(round(median(sample[2] for sample in samples))),
)
def _remove_chroma_key(args: argparse.Namespace) -> None:
Image, _ = _load_pillow()
src = Path(args.input)
out = Path(args.out)
with Image.open(src) as image:
rgba = image.convert("RGBA")
key = (
_sample_border_key(rgba, args.auto_key)
if args.auto_key != "none"
else _parse_key_color(args.key_color)
)
transparent = _apply_alpha_to_image(
rgba,
key=key,
tolerance=args.tolerance,
spill_cleanup=args.spill_cleanup,
soft_matte=args.soft_matte,
transparent_threshold=args.transparent_threshold,
opaque_threshold=args.opaque_threshold,
)
rgba = _contract_alpha(rgba, args.edge_contract)
rgba = _apply_edge_feather(rgba, args.edge_feather)
total, transparent_after, partial_after = _alpha_counts(rgba)
out.parent.mkdir(parents=True, exist_ok=True)
output_format = "PNG" if out.suffix.lower() == ".png" else "WEBP"
out.write_bytes(_encode_image(rgba, output_format))
print(f"Wrote {out}")
print(f"Key color: #{key[0]:02x}{key[1]:02x}{key[2]:02x}")
print(f"Transparent pixels: {transparent_after}/{total}")
print(f"Partially transparent pixels: {partial_after}/{total}")
if transparent == 0:
print("Warning: no pixels matched the key color before feathering.", file=sys.stderr)
def _build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Remove a solid chroma-key background and write an image with alpha."
)
parser.add_argument("--input", required=True, help="Input image path.")
parser.add_argument("--out", required=True, help="Output .png or .webp path.")
parser.add_argument(
"--key-color",
default="#00ff00",
help="Hex RGB key color to remove, for example #00ff00.",
)
parser.add_argument(
"--tolerance",
type=int,
default=12,
help="Hard-key per-channel tolerance for matching the key color, 0-255.",
)
parser.add_argument(
"--auto-key",
choices=["none", "corners", "border"],
default="none",
help="Sample the key color from image corners or border instead of --key-color.",
)
parser.add_argument(
"--soft-matte",
action="store_true",
help="Use a smooth alpha ramp between transparent and opaque thresholds.",
)
parser.add_argument(
"--transparent-threshold",
type=float,
default=12.0,
help="Soft-matte distance at or below which pixels become fully transparent.",
)
parser.add_argument(
"--opaque-threshold",
type=float,
default=96.0,
help="Soft-matte distance at or above which pixels become fully opaque.",
)
parser.add_argument(
"--edge-feather",
type=float,
default=0.0,
help="Optional alpha blur radius for softened edges, 0-64.",
)
parser.add_argument(
"--edge-contract",
type=int,
default=0,
help="Shrink the visible alpha matte by this many pixels before feathering.",
)
parser.add_argument(
"--spill-cleanup",
dest="spill_cleanup",
action="store_true",
help="Reduce obvious key-color spill on opaque pixels.",
)
parser.add_argument(
"--despill",
dest="spill_cleanup",
action="store_true",
help="Alias for --spill-cleanup; decontaminate key-color edge spill.",
)
parser.add_argument("--force", action="store_true", help="Overwrite an existing output file.")
return parser
def main() -> None:
parser = _build_parser()
args = parser.parse_args()
_validate_args(args)
_remove_chroma_key(args)
if __name__ == "__main__":
main()