Ask Knox

Google's Gemini API is the primary slot in the production fallback chain for one reason: it is the only major provider with a real free tier for image generation. Be precise about which model carries it, because Google ships two distinct image paths on the same API:

Gemini 2.5 Flash Image (the model Google nicknamed "Nano Banana") is the free-tier workhorse. The free tier allows roughly 10 requests per minute and on the order of 500 images per day — Google adjusts these numbers, so verify the current limits at ai.google.dev before you size a pipeline against them. For a blog autopilot generating one hero image per article, that capacity is effectively unlimited.

Imagen 4 is the paid flagship. It has no free tier on the Gemini API — every image costs money, flat per-image pricing of roughly $0.02 (Fast) to $0.04 (Standard) to $0.06 (Ultra). What you buy is the strongest text rendering in the market and top-tier photorealism.

Quality from both is competitive with gpt-image-1 across most use cases, and Imagen 4 exceeds it on text rendering — historically the weakest point of diffusion models. The API is synchronous (no polling) and the MCP integration is native.

Imagen 4 — Quality Profile (Paid)

Imagen 4 represents a significant quality jump from Imagen 3 (now shut down). Remember the billing reality: Imagen 4 is paid-only on the Gemini API — there is no free allocation. Google has also been consolidating image generation onto the Gemini-native image model line and has announced shutdown timelines for older Imagen surfaces, so check the current model availability and deprecation notices before you ship against a specific Imagen model ID. The improvements that matter for production content pipelines:

Text rendering. Earlier diffusion models — Stable Diffusion, early Midjourney versions — produced illegible or distorted text in generated images. Imagen 4 renders clean, legible text as part of the image when prompted. For blog thumbnails, social cards, or any image that incorporates text elements, this is significant.

Semantic understanding. Complex scene descriptions with multiple interacting elements are handled more accurately. A prompt describing spatial relationships between objects, character interactions, or environmental layering produces more literal output.

Photorealism at scale. Skin textures, fabric behavior, architectural geometry — the physical coherence that makes an image feel grounded in reality rather than AI-generated — is reliable at the base quality tier, not just in the premium Ultra mode.

Integration Path 1 — REST API (Google AI Studio)

The direct API path uses your Google AI Studio API key. The free-tier primary is Gemini 2.5 Flash Image via the generateContent endpoint:

import httpx
import base64
import os

GOOGLE_AI_KEY = os.getenv("GOOGLE_AI_KEY")

async def generate_image_gemini(prompt: str, aspect_ratio: str = "16:9") -> bytes:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://generativelanguage.googleapis.com/v1beta/models/"
            "gemini-2.5-flash-image:generateContent",
            headers={
                "x-goog-api-key": GOOGLE_AI_KEY,
                "Content-Type": "application/json",
            },
            json={
                "contents": [{"parts": [{"text": prompt}]}],
                "generationConfig": {
                    "responseModalities": ["IMAGE"],
                    "imageConfig": {"aspectRatio": aspect_ratio},
                },
            },
            timeout=60.0,
        )

        if response.status_code == 429:
            raise RateLimitError("Gemini rate limit exceeded — fall through to Leonardo")

        response.raise_for_status()
        parts = response.json()["candidates"][0]["content"]["parts"]
        image_b64 = next(p["inlineData"]["data"] for p in parts if "inlineData" in p)
        return base64.b64decode(image_b64)

When an image needs Imagen 4's quality ceiling — and you are willing to pay per image — the same API key hits the Imagen endpoint instead, which uses a different request shape (:predict with instances/parameters):

response = await client.post(
    "https://generativelanguage.googleapis.com/v1beta/models/"
    "imagen-4.0-generate-001:predict",
    headers={"x-goog-api-key": GOOGLE_AI_KEY, "Content-Type": "application/json"},
    json={
        "instances": [{"prompt": prompt}],
        "parameters": {
            "sampleCount": 1,
            "aspectRatio": aspect_ratio,
            "safetyFilterLevel": "block_some",
            "personGeneration": "allow_adult",
        },
    },
    timeout=60.0,
)
image_b64 = response.json()["predictions"][0]["bytesBase64Encoded"]

Both models return image data synchronously as base64-encoded bytes in the response. No polling loop required — the request blocks until the image is ready, typically 3-8 seconds.

Integration Path 2 — MCP Server (mcp-image)

The mcp-image MCP server wraps Gemini Imagen in a Claude Code-native tool. When configured, any Claude Code session can generate images without writing API code.

Configuration in ~/.claude/settings.json:

{
  "mcpServers": {
    "mcp-image": {
      "command": "npx",
      "args": ["-y", "mcp-image"]
    }
  }
}

In a Claude Code session, the generate_image tool becomes available. The tool accepts a text prompt and returns the image saved to a local path. This is the pattern used in the Knox content pipeline — when generating blog hero images during a coding session, the mcp-image tool fires against Google's Gemini image models without leaving the coding context.

The MCP path uses the same API-key rate limits and billing as the REST API. It is not an alternative rate limit path — it is the same API with a more convenient interface for session-level use.

API Parameters (Imagen 4 `:predict` shape)

aspectRatio: 1:1, 4:3, 3:4, 16:9, 9:16. Specify based on output destination. The model generates natively at the specified ratio — no post-crop required. (On Gemini 2.5 Flash Image the equivalent lives at generationConfig.imageConfig.aspectRatio.)

sampleCount: Number of images to generate per request. Keep it at 1 per request to minimize rate limit pressure — and remember each sample is billed on Imagen 4.

safetyFilterLevel: block_most, block_some, block_few. block_some is the correct default for content pipelines — it blocks genuinely problematic content while avoiding false positives on legitimate creative work.

personGeneration: dont_allow, allow_adult, allow_all. Set based on your use case. allow_adult is the standard setting for content pipelines featuring professional photography-style imagery.

seed: Integer seed for reproducibility. Use when you need to regenerate a specific image with slight prompt variations and want consistent composition between runs.

Inpainting and Image Editing

Be careful with this capability boundary, because it trips up a lot of integrations: mask-based Imagen inpainting is a Vertex AI capability, not a Gemini Developer API capability. The image + mask editing pattern (a base image plus a mask PNG whose transparent regions mark what to regenerate) runs against Vertex AI's Imagen capability models and Vertex endpoints, with Vertex authentication — it is not available on the generativelanguage.googleapis.com endpoint and AI Studio API key path this lesson teaches. If you need mask-precise inpainting from Google, that is a Vertex AI integration with its own setup.

On the Gemini Developer API, the editing path is Gemini 2.5 Flash Image conversational editing: send the source image plus a natural-language instruction in a generateContent request, and the model returns the edited image.

json={
    "contents": [{
        "parts": [
            {"inlineData": {"mimeType": "image/png", "data": source_image_b64}},
            {"text": "Replace the background with a modern city skyline at dusk"},
        ]
    }],
    "generationConfig": {"responseModalities": ["IMAGE"]},
}

The editing quality is strong — the model blends regenerated regions into the existing image's lighting and color space coherently. You trade mask-level precision for instruction-level convenience. For content pipelines that need to swap backgrounds or add/remove elements in generated images, the conversational path covers most cases; reach for Vertex AI's mask-based Imagen editing only when you need pixel-precise region control.

Rate Limit Handling

The Flash Image free tier's rate limit (roughly 10 RPM and ~500 images/day — verify current numbers) is the only operational constraint that matters for most pipelines. The correct handling:

try:
    image_bytes = await generate_image_gemini(prompt, aspect_ratio)
except RateLimitError:
    # Immediate fallback — no retry on Gemini
    image_bytes = await generate_image_leonardo(prompt)
except Exception as e:
    logger.error(f"Gemini generation failed: {e}")
    image_bytes = await generate_image_leonardo(prompt)

Do not retry on rate limit. The rate limit window is per-minute, so a retry will hit the same limit immediately. Fall through to Leonardo and let the primary restore capacity for the next request.

Comparison: Gemini vs Competitors at the Free Tier

DALL-E 3 has no meaningful free tier — every image costs money. gpt-image-1 same. Imagen 4 — Google's own flagship — is also paid-only. Leonardo AI has a free allocation that depletes quickly under production use. Gemini 2.5 Flash Image's free tier is the most generous of any major provider and the only one where content pipeline volumes realistically operate at zero cost.

This is the fundamental reason the Gemini API occupies the primary slot: cost architecture. You are not choosing it because it is definitively the best model. You are choosing it because it changes the cost structure of the entire pipeline from variable per-image spend to zero — with a quality level that is more than adequate for the use cases that matter.

Lesson Drill

Obtain a Google AI Studio API key. Implement the Gemini 2.5 Flash Image REST integration above with proper rate limit exception handling. Generate 10 images across different aspect ratios. Generate one or two of the same prompts against paid Imagen 4 and compare quality and text rendering against the free-tier output. Verify that the returned base64 decodes to a valid image file. You now have the primary provider integration ready for the complete pipeline in the next lesson.

Bottom Line

The Gemini API is the primary slot because the Gemini 2.5 Flash Image free tier is real, the quality is production-grade, and the synchronous API is the cleanest integration pattern in the chain. Imagen 4 is the paid upgrade — $0.02-0.06 per image, no free allocation — for the subset of images that need best-in-class text rendering. The rate limit is the only variable — handle it with an immediate fall-through to Leonardo and your pipeline will spend most of its image budget at zero cost. That is the correct architecture for any content pipeline that does not require Midjourney quality on every single image.

Gemini Imagen — Google's Visual Intelligence