Image Generation
POST /api/v1/agents/images is a one-shot image generation and edit primitive. You send a prompt (and optionally a reference image and inpainting mask), the platform calls Azure OpenAI or Google Gemini, uploads the result to Azure Blob storage, and returns a public URL.
It is a sibling of /api/v1/agents/structured — same shape, same auth, same gateway timeout (120s).
When to use this
- A skill or agent tool needs to produce an image (illustration, mockup, social asset, generated avatar).
- A frontend feature needs a generated image and would rather have a URL than handle 1–10 MB of inline base64.
- An agent wants to edit an existing image — change the sky, add a watermark, repaint a region — chaining the URL of a previously generated image into the next call.
Do not use this for: high-volume batch jobs, streaming partial images, or any flow that needs n > 1 images per call. v1 returns one image per request.
Endpoint
POST /api/v1/agents/images
Authorization: Bearer <JWT> # JWT must carry scope: write:image-gen
Content-Type: application/json
The gateway sets X-Org-ID from your token; the agents service requires it. Request timeout is 120 seconds — image generation typically takes 5–35 seconds depending on model and size.
Request
{
// Required.
"provider": "azure-openai", // "azure-openai" | "gemini"
"model": "gpt-image-2", // provider-native deployment / model id
"prompt": "A photograph of a red fox in an autumn forest",
// Optional generation knobs — caller-supplied, provider-validated.
"size": "1024x1024", // see Provider quirks below
"quality": "low", // "low" | "medium" | "high" | "auto" (gpt-image)
"style": "vivid", // "vivid" | "natural" (dall-e-3 only)
"negativePrompt": "no text, no logo", // gemini-only
// Edit mode — present only if you want to edit an existing image.
// Pass EITHER url OR b64, not both. mimeType is required.
"referenceImage": {
"url": "https://eloquentsc.blob.core.windows.net/files/images/.../abc.png",
// "b64": "iVBORw0KGgoA...", // alternative
"mimeType": "image/png" // "image/png" | "image/jpeg"
},
// Inpainting mask — Azure-only, edit-mode only, PNG only.
// Transparent pixels (alpha = 0) mark the editable region.
"mask": { "url": "https://.../mask.png" }
}
Response
{
"data": {
"url": "https://eloquentsc.blob.core.windows.net/files/images/{orgId}/2026-04-26/9f8a...c2.png",
"mimeType": "image/png",
"bytes": 1843212,
"width": 1024, // when reported by the provider
"height": 1024,
"revisedPrompt": "A cinematic 35mm...", // passthrough if provider returns one
"provider": "azure-openai",
"model": "gpt-image-2",
"mode": "generate", // "generate" | "edit"
"usage": {
"inputTokens": 16,
"outputTokens": 208,
"totalTokens": 224
}
},
"metadata": { "success": true, "timestamp": 1777225241 }
}
The blob lives at {container}/images/{orgId}/{yyyy-mm-dd}/{uuid}.{png|jpg}. The URL renders directly in browsers — no signed-URL handling needed in dev (the dev container is publicly readable).
Picking a provider and model
| Use case | Provider | Model | Why |
|---|---|---|---|
| Photo-realistic generation, predictable Azure billing | azure-openai | gpt-image-2 | Highest quality, supports edit + mask |
| Quick conceptual / illustrative image | gemini | gemini-3.1-flash-image-preview | Faster, cheaper, accepts URL references |
| Edit with inpainting mask | azure-openai | gpt-image-2 | Only Azure supports masks |
| Edit by passing a reference image URL | gemini | gemini-3.1-flash-image-preview | Native multimodal input — Azure decodes URLs server-side |
| Legacy DALL-E flows | azure-openai | dall-e-3 | Supports `style: vivid |
model is not a logical name — it must be the provider-native deployment id (Azure) or the published model name (Gemini). Unknown models return INVALID_INPUT from the upstream provider.
Examples
Generate
curl -X POST $GATEWAY/api/v1/agents/images \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"provider": "azure-openai",
"model": "gpt-image-2",
"prompt": "Studio photo of a small ceramic mug, soft morning light",
"size": "1024x1024",
"quality": "high"
}'
Edit by URL (chain a previous result)
PREV_URL="https://eloquentsc.blob.core.windows.net/files/images/.../abc.png"
curl -X POST $GATEWAY/api/v1/agents/images \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d "{
\"provider\": \"gemini\",
\"model\": \"gemini-3.1-flash-image-preview\",
\"prompt\": \"Replace the sky with a sunset\",
\"referenceImage\": {
\"url\": \"$PREV_URL\",
\"mimeType\": \"image/png\"
}
}"
Edit by base64
B64=$(base64 -i input.png | tr -d '\n')
curl -X POST $GATEWAY/api/v1/agents/images \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d "{
\"provider\": \"azure-openai\",
\"model\": \"gpt-image-2\",
\"prompt\": \"Make it look like a watercolor painting\",
\"size\": \"1024x1024\",
\"quality\": \"low\",
\"referenceImage\": { \"b64\": \"$B64\", \"mimeType\": \"image/png\" }
}"
Edit with inpainting mask (Azure-only)
curl -X POST $GATEWAY/api/v1/agents/images \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d "{
\"provider\": \"azure-openai\",
\"model\": \"gpt-image-2\",
\"prompt\": \"Replace the masked area with a wooden table\",
\"size\": \"1024x1024\",
\"referenceImage\": { \"url\": \"$REF_URL\", \"mimeType\": \"image/png\" },
\"mask\": { \"url\": \"$MASK_URL\" }
}"
The mask must be a PNG with the same dimensions as the reference image. Pixels where alpha = 0 are repainted; opaque pixels are kept.
Provider quirks
These are caller-visible details — pick the model first, then read the matching row.
| Behavior | azure-openai (gpt-image-*) | azure-openai (dall-e-3) | gemini |
|---|---|---|---|
size accepted | 1024x1024, 1536x1024, 1024x1536, auto | 1024x1024, 1792x1024, 1024x1792 | 1:1, 16:9, 9:16, 4:3, 3:4, or pixel string |
quality accepted | low, medium, high, auto | standard, hd | ignored (uses model defaults) |
style accepted | ignored | vivid, natural | ignored |
negativePrompt | ignored (warn-logged server-side) | ignored | honored |
| Reference image | server-side decoded → multipart image | not supported (use gpt-image-2) | inlined into the request |
| Mask | supported (PNG, alpha = editable) | not supported | not supported |
| Returned MIME | image/png (gpt-image-2 may return image/jpeg) | image/png | image/jpeg typical, image/png possible |
usage reported | yes (input/output/total tokens) | not reported | not reported |
| Typical latency | 15–35 s | 10–25 s | 8–15 s |
Reference-image URL allowlist (security)
When you pass a url in referenceImage or mask, the agents service:
- Requires
https. Rejectshttp,file,data, etc. - Validates the hostname is in the allowlist. The default list contains the Azure storage account that hosts our generated blobs (
eloquentsc.blob.core.windows.net). Operators can extend withIMAGE_REF_ALLOWED_HOSTS(comma-separated env var on the agents service). - Refuses to dial loopback / link-local / private / metadata IPs, even after DNS resolution (defends DNS rebinding).
- Caps fetch size at 20 MB and timeout at 10 s.
- Sniffs the magic bytes and rejects anything that isn't a real PNG or JPEG.
Practical implication for agents: the only URLs you can pass in normal use are URLs we issued ourselves (i.e. blob URLs returned by a previous call to this endpoint). Chaining works out of the box; arbitrary external URLs are rejected.
If you have an arbitrary URL you need to feed in (a Slack attachment, a user upload from somewhere else), download it yourself first and pass b64 instead.
Validation rules
The handler returns 400 INVALID_INPUT or 400 VALIDATION_ERROR for these conditions before reaching the provider:
provideris not one of{"azure-openai", "gemini"}modelis emptypromptis empty or longer than 4000 charactersreferenceImagehas both or neither ofurl/b64, or amimeTypeoutside{image/png, image/jpeg}maskis set butreferenceImageis notmaskis set withprovider != "azure-openai"maskbytes are not a real PNG (alpha-aware)- Reference URL is
http,file, etc., or its host isn't in the allowlist
Error model
Errors come back in the standard envelope:
{
"error": { "code": "VALIDATION_ERROR", "message": "...", "details": { ... } },
"metadata": { "success": false, "timestamp": 1777225265 }
}
The relevant codes:
| Status | Code | Meaning |
|---|---|---|
| 400 | INVALID_INPUT | Bad request, unknown provider/model, content-policy violation |
| 400 | VALIDATION_ERROR | Mask without reference, mask with Gemini, URL not in allowlist, bad reference bytes |
| 401 | UNAUTHORIZED | Missing or expired JWT |
| 403 | FORBIDDEN | JWT lacks write:image-gen scope |
| 503 | SERVICE_UNAVAILABLE | Provider not registered (env vars missing), upstream 5xx, rate-limited |
| 500 | SYSTEM_ERROR | Blob upload failed |
| 504 | TIMEOUT | Provider exceeded the 120s gateway timeout |
Recipes for agents
"Generate, then iterate"
A common multi-step pattern: produce a base image, then edit it 1–2 times. Pass the url from each response into the next call's referenceImage.url. Because every URL is on our allowlisted Azure account, no extra plumbing is needed.
const base = await callImage({ provider: "azure-openai", model: "gpt-image-2",
prompt: "...", size: "1024x1024", quality: "low" });
const edited = await callImage({ provider: "gemini", model: "gemini-3.1-flash-image-preview",
prompt: "Add a subtle sepia tone",
referenceImage: { url: base.data.url, mimeType: "image/png" } });
"Mask a known region"
If you know the editable region in pixel space, generate the mask programmatically (e.g. with sharp or pillow) and pass it as base64. Mask must be PNG, same dimensions as the reference, and use transparency (not white) for the editable pixels.
Cost-conscious calls
For first-pass exploration use gemini — fastest and cheapest. Fall back to gpt-image-2 only when you need the edit-with-mask flow or higher fidelity. Gemini ignores quality and style, so don't bother sending them.
Avoid base64 in JSON when possible
Edit calls with b64 push large strings through JSON parsers in the gateway and agents service. If the source bytes are already on our blob account (e.g. they came from a previous call), pass referenceImage.url instead — much smaller request, lower memory churn.
Implementation notes (for service operators)
- The endpoint lives in the agents service (
backend/services/agents/http/image_handler.go), not a new service. It shares the LLM provider factory pattern with/agents/structured. - Provider implementations are in
backend/common/llm/providers/:gemini_image_provider.go,azure_openai_image_provider.go. The Gemini code was lifted from the existingintegrations/media/providers(which still serves NATS-based image-gen tool calls — left untouched). - Env vars read in
services/agents/main.go(per platform convention):GOOGLE_API_KEY— Gemini key (also used by the existing media service)AZURE_OPENAI_IMAGE_ENDPOINT— image-specific endpoint (*.cognitiveservices.azure.com); falls back toAZURE_OPENAI_ENDPOINTif unsetAZURE_OPENAI_IMAGE_API_KEY— falls back toAZURE_OPENAI_API_KEYAZURE_OPENAI_IMAGE_API_VERSION— defaults to2024-02-01; must be2025-04-01-previewor newer for edit + maskIMAGE_REF_ALLOWED_HOSTS— optional, extends the URL allowlist
- The Azure edit path bypasses the
openai-goSDK and POSTsmultipart/form-datadirectly to/openai/deployments/{model}/images/edits— the SDK doesn't produce that URL pattern for Azure today. - Spec:
docs/agents/image-generation.md.
Limitations (v1)
- One image per call (
nis not exposed). - No streaming partial images.
- No DALL-E variations endpoint (
Images.CreateVariation). - No per-org quota / billing — relies on provider rate limits.
- Generated blobs persist indefinitely; lifecycle policy is a follow-up.