Image Generation

POST /api/v1/agents/images is a one-shot image generation and edit primitive. You send a prompt (and optionally a reference image and inpainting mask), the platform calls Azure OpenAI or Google Gemini, uploads the result to Azure Blob storage, and returns a public URL.

It is a sibling of /api/v1/agents/structured — same shape, same auth, same gateway timeout (120s).

When to use this

A skill or agent tool needs to produce an image (illustration, mockup, social asset, generated avatar).
A frontend feature needs a generated image and would rather have a URL than handle 1–10 MB of inline base64.
An agent wants to edit an existing image — change the sky, add a watermark, repaint a region — chaining the URL of a previously generated image into the next call.

Do not use this for: high-volume batch jobs, streaming partial images, or any flow that needs n > 1 images per call. v1 returns one image per request.

Endpoint

POST /api/v1/agents/images
Authorization: Bearer <JWT>     # JWT must carry scope: write:image-gen
Content-Type:  application/json

The gateway sets X-Org-ID from your token; the agents service requires it. Request timeout is 120 seconds — image generation typically takes 5–35 seconds depending on model and size.

Request

{
  // Required.
  "provider": "azure-openai",       // "azure-openai" | "gemini"
  "model":    "gpt-image-2",        // provider-native deployment / model id
  "prompt":   "A photograph of a red fox in an autumn forest",

  // Optional generation knobs — caller-supplied, provider-validated.
  "size":           "1024x1024",    // see Provider quirks below
  "quality":        "low",          // "low" | "medium" | "high" | "auto" (gpt-image)
  "style":          "vivid",        // "vivid" | "natural" (dall-e-3 only)
  "negativePrompt": "no text, no logo", // gemini-only

  // Edit mode — present only if you want to edit an existing image.
  // Pass EITHER url OR b64, not both. mimeType is required.
  "referenceImage": {
    "url":      "https://eloquentsc.blob.core.windows.net/files/images/.../abc.png",
    // "b64":   "iVBORw0KGgoA...",          // alternative
    "mimeType": "image/png"                  // "image/png" | "image/jpeg"
  },

  // Inpainting mask — Azure-only, edit-mode only, PNG only.
  // Transparent pixels (alpha = 0) mark the editable region.
  "mask": { "url": "https://.../mask.png" }
}

Response

{
  "data": {
    "url":           "https://eloquentsc.blob.core.windows.net/files/images/{orgId}/2026-04-26/9f8a...c2.png",
    "mimeType":      "image/png",
    "bytes":         1843212,
    "width":         1024,                   // when reported by the provider
    "height":        1024,
    "revisedPrompt": "A cinematic 35mm...",  // passthrough if provider returns one
    "provider":      "azure-openai",
    "model":         "gpt-image-2",
    "mode":          "generate",             // "generate" | "edit"
    "usage": {
      "inputTokens":  16,
      "outputTokens": 208,
      "totalTokens":  224
    }
  },
  "metadata": { "success": true, "timestamp": 1777225241 }
}

The blob lives at {container}/images/{orgId}/{yyyy-mm-dd}/{uuid}.{png|jpg}. The URL renders directly in browsers — no signed-URL handling needed in dev (the dev container is publicly readable).

Picking a provider and model

Use case	Provider	Model	Why
Photo-realistic generation, predictable Azure billing	`azure-openai`	`gpt-image-2`	Highest quality, supports edit + mask
Quick conceptual / illustrative image	`gemini`	`gemini-3.1-flash-image-preview`	Faster, cheaper, accepts URL references
Edit with inpainting mask	`azure-openai`	`gpt-image-2`	Only Azure supports masks
Edit by passing a reference image URL	`gemini`	`gemini-3.1-flash-image-preview`	Native multimodal input — Azure decodes URLs server-side
Legacy DALL-E flows	`azure-openai`	`dall-e-3`	Supports `style: vivid

model is not a logical name — it must be the provider-native deployment id (Azure) or the published model name (Gemini). Unknown models return INVALID_INPUT from the upstream provider.

Examples

Generate

curl -X POST $GATEWAY/api/v1/agents/images \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "azure-openai",
    "model":    "gpt-image-2",
    "prompt":   "Studio photo of a small ceramic mug, soft morning light",
    "size":     "1024x1024",
    "quality":  "high"
  }'

Edit by URL (chain a previous result)

PREV_URL="https://eloquentsc.blob.core.windows.net/files/images/.../abc.png"
curl -X POST $GATEWAY/api/v1/agents/images \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d "{
    \"provider\": \"gemini\",
    \"model\":    \"gemini-3.1-flash-image-preview\",
    \"prompt\":   \"Replace the sky with a sunset\",
    \"referenceImage\": {
      \"url\": \"$PREV_URL\",
      \"mimeType\": \"image/png\"
    }
  }"

Edit by base64

B64=$(base64 -i input.png | tr -d '\n')
curl -X POST $GATEWAY/api/v1/agents/images \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d "{
    \"provider\": \"azure-openai\",
    \"model\":    \"gpt-image-2\",
    \"prompt\":   \"Make it look like a watercolor painting\",
    \"size\":     \"1024x1024\",
    \"quality\":  \"low\",
    \"referenceImage\": { \"b64\": \"$B64\", \"mimeType\": \"image/png\" }
  }"

Edit with inpainting mask (Azure-only)

curl -X POST $GATEWAY/api/v1/agents/images \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d "{
    \"provider\": \"azure-openai\",
    \"model\":    \"gpt-image-2\",
    \"prompt\":   \"Replace the masked area with a wooden table\",
    \"size\":     \"1024x1024\",
    \"referenceImage\": { \"url\": \"$REF_URL\", \"mimeType\": \"image/png\" },
    \"mask\":           { \"url\": \"$MASK_URL\" }
  }"

The mask must be a PNG with the same dimensions as the reference image. Pixels where alpha = 0 are repainted; opaque pixels are kept.

Provider quirks

These are caller-visible details — pick the model first, then read the matching row.

Behavior	`azure-openai` (`gpt-image-*`)	`azure-openai` (`dall-e-3`)	`gemini`
`size` accepted	`1024x1024`, `1536x1024`, `1024x1536`, `auto`	`1024x1024`, `1792x1024`, `1024x1792`	`1:1`, `16:9`, `9:16`, `4:3`, `3:4`, or pixel string
`quality` accepted	`low`, `medium`, `high`, `auto`	`standard`, `hd`	ignored (uses model defaults)
`style` accepted	ignored	`vivid`, `natural`	ignored
`negativePrompt`	ignored (warn-logged server-side)	ignored	honored
Reference image	server-side decoded → multipart `image`	not supported (use `gpt-image-2`)	inlined into the request
Mask	supported (PNG, alpha = editable)	not supported	not supported
Returned MIME	`image/png` (`gpt-image-2` may return `image/jpeg`)	`image/png`	`image/jpeg` typical, `image/png` possible
`usage` reported	yes (input/output/total tokens)	not reported	not reported
Typical latency	15–35 s	10–25 s	8–15 s

Reference-image URL allowlist (security)

When you pass a url in referenceImage or mask, the agents service:

Requires https. Rejects http, file, data, etc.
Validates the hostname is in the allowlist. The default list contains the Azure storage account that hosts our generated blobs (eloquentsc.blob.core.windows.net). Operators can extend with IMAGE_REF_ALLOWED_HOSTS (comma-separated env var on the agents service).
Refuses to dial loopback / link-local / private / metadata IPs, even after DNS resolution (defends DNS rebinding).
Caps fetch size at 20 MB and timeout at 10 s.
Sniffs the magic bytes and rejects anything that isn't a real PNG or JPEG.

Practical implication for agents: the only URLs you can pass in normal use are URLs we issued ourselves (i.e. blob URLs returned by a previous call to this endpoint). Chaining works out of the box; arbitrary external URLs are rejected.

If you have an arbitrary URL you need to feed in (a Slack attachment, a user upload from somewhere else), download it yourself first and pass b64 instead.

Validation rules

The handler returns 400 INVALID_INPUT or 400 VALIDATION_ERROR for these conditions before reaching the provider:

provider is not one of {"azure-openai", "gemini"}
model is empty
prompt is empty or longer than 4000 characters
referenceImage has both or neither of url / b64, or a mimeType outside {image/png, image/jpeg}
mask is set but referenceImage is not
mask is set with provider != "azure-openai"
mask bytes are not a real PNG (alpha-aware)
Reference URL is http, file, etc., or its host isn't in the allowlist

Error model

Errors come back in the standard envelope:

{
  "error":  { "code": "VALIDATION_ERROR", "message": "...", "details": { ... } },
  "metadata": { "success": false, "timestamp": 1777225265 }
}

The relevant codes:

Status	Code	Meaning
400	`INVALID_INPUT`	Bad request, unknown provider/model, content-policy violation
400	`VALIDATION_ERROR`	Mask without reference, mask with Gemini, URL not in allowlist, bad reference bytes
401	`UNAUTHORIZED`	Missing or expired JWT
403	`FORBIDDEN`	JWT lacks `write:image-gen` scope
503	`SERVICE_UNAVAILABLE`	Provider not registered (env vars missing), upstream 5xx, rate-limited
500	`SYSTEM_ERROR`	Blob upload failed
504	`TIMEOUT`	Provider exceeded the 120s gateway timeout

Recipes for agents

"Generate, then iterate"

A common multi-step pattern: produce a base image, then edit it 1–2 times. Pass the url from each response into the next call's referenceImage.url. Because every URL is on our allowlisted Azure account, no extra plumbing is needed.

const base = await callImage({ provider: "azure-openai", model: "gpt-image-2",
  prompt: "...", size: "1024x1024", quality: "low" });

const edited = await callImage({ provider: "gemini", model: "gemini-3.1-flash-image-preview",
  prompt: "Add a subtle sepia tone",
  referenceImage: { url: base.data.url, mimeType: "image/png" } });

"Mask a known region"

If you know the editable region in pixel space, generate the mask programmatically (e.g. with sharp or pillow) and pass it as base64. Mask must be PNG, same dimensions as the reference, and use transparency (not white) for the editable pixels.

Cost-conscious calls

For first-pass exploration use gemini — fastest and cheapest. Fall back to gpt-image-2 only when you need the edit-with-mask flow or higher fidelity. Gemini ignores quality and style, so don't bother sending them.

Avoid base64 in JSON when possible

Edit calls with b64 push large strings through JSON parsers in the gateway and agents service. If the source bytes are already on our blob account (e.g. they came from a previous call), pass referenceImage.url instead — much smaller request, lower memory churn.

Implementation notes (for service operators)

The endpoint lives in the agents service (backend/services/agents/http/image_handler.go), not a new service. It shares the LLM provider factory pattern with /agents/structured.
Provider implementations are in backend/common/llm/providers/: gemini_image_provider.go, azure_openai_image_provider.go. The Gemini code was lifted from the existing integrations/media/providers (which still serves NATS-based image-gen tool calls — left untouched).
Env vars read in services/agents/main.go (per platform convention):
- GOOGLE_API_KEY — Gemini key (also used by the existing media service)
- AZURE_OPENAI_IMAGE_ENDPOINT — image-specific endpoint (*.cognitiveservices.azure.com); falls back to AZURE_OPENAI_ENDPOINT if unset
- AZURE_OPENAI_IMAGE_API_KEY — falls back to AZURE_OPENAI_API_KEY
- AZURE_OPENAI_IMAGE_API_VERSION — defaults to 2024-02-01; must be 2025-04-01-preview or newer for edit + mask
- IMAGE_REF_ALLOWED_HOSTS — optional, extends the URL allowlist
The Azure edit path bypasses the openai-go SDK and POSTs multipart/form-data directly to /openai/deployments/{model}/images/edits — the SDK doesn't produce that URL pattern for Azure today.
Spec: docs/agents/image-generation.md.

Limitations (v1)

One image per call (n is not exposed).
No streaming partial images.
No DALL-E variations endpoint (Images.CreateVariation).
No per-org quota / billing — relies on provider rate limits.
Generated blobs persist indefinitely; lifecycle policy is a follow-up.