Building with Gemini Embedding 2: Agentic Multimodal RAG
Harness Gemini Embedding 2 to create sophisticated agentic multimodal RAG systems for advanced AI applications.

The dream of seamless AI automation is often sold as a flick of a switch. But the reality of deploying AI agents in production, especially when migrating from legacy monoliths, is a complex dance of architecture, resilience, and rigorous oversight. Forget brittle prototypes; we’re talking about robust, scalable systems. Google’s recent experiences, particularly from their “AI Agent Clinic,” offer a hard-won blueprint. Here are five critical lessons learned from refactoring monoliths to truly power production-ready AI agents.
You’ve got a monolith. It’s a single, imposing codebase. And you want to layer AI agents on top, perhaps to automate complex workflows or assist in modernization efforts. The temptation is to build one massive “God Agent” to handle everything. This is where the refactoring imperative kicks in. Treating AI agents as mere extensions of a monolithic script is a recipe for disaster, fraught with fragility and unexpected costs. The path to production readiness demands a fundamental architectural shift, and we can learn a lot from the trials of those who have already navigated this treacherous terrain.
The biggest architectural mistake? Building a single, monolithic AI agent tasked with performing every conceivable function. Instead, embrace orchestrated sub-agents. Think of specialized components: one for triage, another for billing, a third for data retrieval. This mirrors the microservices philosophy and makes agents more manageable, testable, and resilient. Frameworks like Google’s own ADK promote this, often using a SequentialAgent or similar patterns.
Why this matters: Monolithic agents become black boxes, impossible to debug. Specialized agents isolate failures and allow for independent scaling and updates.
Early AI agent development often involves hardcoded JSON outputs for structured responses. This is brittle. A slight change in prompt wording, and your parsing logic breaks. The solution? Native Pydantic objects for explicit schema definitions. This enforces structural integrity and eliminates the pain of fragile JSON parsing. The ADK dynamically handles this, allowing agents to define and consume structured data with confidence.
Example:
from pydantic import BaseModel, Field
class BillingDetails(BaseModel):
invoice_number: str = Field(description="The unique identifier for the invoice.")
amount_due: float = Field(description="The total amount outstanding.")
due_date: str = Field(description="The date the payment is due (YYYY-MM-DD).")
# Agent's tool output would be a Pydantic object, not raw JSON
def get_billing_info(customer_id: str) -> BillingDetails:
# ... logic to fetch billing data ...
pass
Why this matters: Predictable data structures are non-negotiable for reliable automation. Pydantic provides this clarity at the schema level, making agent interactions robust.
Hardcoding context into RAG (Retrieval Augmented Generation) pipelines is a relic of early prototyping. For production, you need dynamic RAG. This involves integrating tools like Playwright for real-time web crawling and leveraging services like Google Cloud Vector Search to fetch relevant, up-to-date information on the fly.
Why this matters: Static context quickly becomes stale, leading to outdated or irrelevant agent responses. Dynamic RAG ensures agents operate with the freshest available information.
Production AI agents face an unpredictable world: rate limits, network hiccups, runaway token costs, and transient errors. Operational resilience isn’t an afterthought; it’s a core requirement. Orchestration frameworks are vital here, providing built-in features like:
Why this matters: Without these guardrails, agents can become self-destructive, consuming excessive resources or failing catastrophically.
The non-deterministic nature of AI means you will encounter issues. Observability, powered by OpenTelemetry, is critical for production agents. Live telemetry dashboards allow you to detect problems early, understand agent behavior, and debug complex failures that are often opaque in monolithic systems.
Why this matters: Debugging a monolithic script is hard. Debugging a distributed system of AI agents requires real-time visibility into every step.
While AI agents are undeniably powerful tools for tasks like initial code drafting and complex analysis, they are not a silver bullet. Their non-deterministic behavior, susceptibility to hallucinations, and potential for unexpected prompt injection mean human oversight, rigorous testing, and comprehensive observability are non-negotiable for production reliability.
AI agents can be transformative in automating the analysis and planning phases of monolith-to-microservices migrations, as tools like Byteable and Moderne show promise in automating service boundary detection. However, they perform best on well-structured codebases. Building production-ready AI agents requires a commitment to robust architecture, explicit data handling, operational resilience, and unwavering vigilance. The monolith may be gone, but the principles of sound engineering remain paramount.