Cloud Run GPU Hackathon AI Agent Sample Repo: Deploy Your ADK Agent to Cloud Run with GPU

In this sample repo, you'll complete the prototype-to-production journey by taking a working ADK agent and deploying it as a scalable, robust application on Google Cloud Run with GPU support.

🏗️ What You'll Build

You'll deploy a Production Gemma3 Agent with conversational capabilities:

Gemma Agent (GPU-Accelerated):

General conversations and Q&A
Creative writing assistance
Production-ready deployment on Cloud Run

📋 Prerequisites

Google Cloud Project with billing enabled. (Please follow the Hackathon handbook manual instruction to apply the credit coupon to your project for the hackathon - DO NOT use your personal credit card)
Google Cloud SDK installed and configured
Basic understanding of containers and cloud deployment

🚀 Lab Overview

Part 1: Understanding the Production Agent (10 minutes)

Let's first explore the agent we'll be deploying:

Agent Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   User Request  │ -> │   ADK Agent     │ -> │  Gemma Backend  │
│                 │    │  (Cloud Run)    │    │ (Cloud Run+GPU) │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              v
                       ┌─────────────────┐
                       │ FastAPI Server  │
                       │ Health Checks   │
                       │─────────────────┘

Key Components

Prerequisites

# Set your Google Cloud project
export PROJECT_ID="your-project-id"
gcloud config set project $PROJECT_ID
gcloud config set run/region europe-west1

# Enable APIs
gcloud services enable run.googleapis.com cloudbuild.googleapis.com aiplatform.googleapis.com

# Ensure some default permissions exist in the project for `gcloud run deploy` are correctly set; Primarily Cloud Build
export PROJECT_NUMBER=$(gcloud projects describe $(gcloud config get-value project) --format="value(projectNumber)")
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
	--role roles/run.viewer \
	--member "serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com"
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
	--role roles/storage.objectAdmin \
	--member "serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com"
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
	--role roles/artifactregistry.createOnPushRepoAdmin \
	--member "serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com"
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
	--role roles/logging.logWriter \
	--member "serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com"

Deploy Gemma Backend

cd hackathon-cloudrun/ollama-backend

gcloud run deploy ollama-gemma3-4b-gpu \
  --source . \
  --concurrency 4 \
  --cpu 8 \
  --set-env-vars OLLAMA_NUM_PARALLEL=4 \
  --gpu 1 \
  --gpu-type nvidia-l4 \
  --max-instances 1 \
  --memory 32Gi \
  --allow-unauthenticated \
  --no-cpu-throttling \
  --no-gpu-zonal-redundancy \
  --timeout=600

# Disable Invoker IAM check
gcloud run services update ollama-gemma3-4b-gpu --no-invoker-iam-check

## download ollama utility and test the Cloud Run GPU service that is created
#curl -fsSL https://ollama.com/install.sh
#OLLAMA_HOST=<Cloud Run SERVICE URL generated above> ollama run gemma3:4b
export OLLAMA_URL=$(gcloud run services describe ollama-gemma3-4b-gpu \
  --region europe-west1 \
  --format='value(status.url)')
curl "$OLLAMA_URL"

Deploy ADK Cloud Run Agent that calls the Gemma Backend

# go to the ADK agent directory
cd hackathon-cloudrun/adk-agent

export OLLAMA_URL=$(gcloud run services describe ollama-gemma3-4b-gpu \
  --region europe-west1 \
  --format='value(status.url)')

# Create environment file

cat > .env << EOF
GOOGLE_CLOUD_PROJECT=$PROJECT_ID
GOOGLE_CLOUD_LOCATION=europe-west1
GEMMA_MODEL_NAME=gemma3:4b
OLLAMA_API_BASE=$OLLAMA_URL
EOF

# Deploy the ADK based AI agent to Cloud Run with ADK webUI 

gcloud run deploy production-adk-agent \
    --source . \
    --region europe-west1 \
    --allow-unauthenticated \
    --memory 4Gi \
    --cpu 2 \
    --max-instances 1 \
    --concurrency 50 \
    --timeout 300 \
    --set-env-vars GOOGLE_CLOUD_PROJECT=$PROJECT_ID \
    --set-env-vars GOOGLE_CLOUD_LOCATION=europe-west1 \
    --set-env-vars GEMMA_MODEL_NAME=gemma3:4b \
    --set-env-vars OLLAMA_API_BASE=$OLLAMA_URL \
    --set-env-vars USE_OPENAI_FAKE=True

gcloud run services update production-adk-agent --no-invoker-iam-check

Understanding some env variables:

Environment Variables

The application's behavior can be configured through the following environment variables.

Variable	Description	Default Value
`GEMMA_MODEL_NAME`	Specifies the name of the Gemma model to be used.	`gemma3:4b`
`OLLAMA_API_BASE`	The base URL for the Ollama API endpoint.	`http://localhost:10010`
`USE_OPENAI_FAKE`	Set to `true` to use an OpenAI-compatible API wrapper around Ollama. This enables context-aware, multi-modal conversations.	`False`
`USE_OLLAMA_NO_CONTEXT`	Set to `true` to use the direct Ollama API for multi-modal input. Note: This mode may not retain conversational context.	`False`

Connection Modes

The agent connects to the Ollama model in one of three ways, controlled by the boolean flags USE_OPENAI_FAKE and USE_OLLAMA_NO_CONTEXT:

Default (Context-aware Chat):
- Configuration: USE_OPENAI_FAKE and USE_OLLAMA_NO_CONTEXT are both False.
- Behavior: Uses the ollama_chat provider for standard, context-aware chat. According to code comments, this mode may have issues with multi-modal inputs.
OpenAI Fake (Context-aware, Multi-modal):
- Configuration: USE_OPENAI_FAKE=true
- Behavior: Routes requests through an OpenAI-compatible endpoint (/v1) on the Ollama server. This is the recommended mode for achieving context-aware, multi-modal chat.
Ollama Direct (Multi-modal, No Context):
- Configuration: USE_OLLAMA_NO_CONTEXT=true
- Behavior: Uses the standard ollama provider. This mode supports multi-modal inputs directly but may fail to retain conversation history.

Test Your Agent's health

# Get service URL
export SERVICE_URL=$(gcloud run services describe production-adk-agent \
    --region=europe-west1 \
    --format='value(status.url)')

# Test health endpoint
curl $SERVICE_URL/health

🎉 Test your Agent with the ADK WebUI

Your production ADK agent is now running on Cloud Run with GPU acceleration!

Interact with your agent by entering the SERVICE_URL above for your production-adk-agent into a new browser tab. You should see the ADK web interface.

Try these queries: Gemma Agent (Conversational):

"What is the color of a polar bear's skin ?"
"What is the primary food source of a Giant Panda, a frequently exhibited endangered species?"

Clean up

Follow these steps to delete the resources you created in this lab to avoid incurring further charges.

Examples of how to delete the two Cloud Run services that were deployed in this repo. You can also delete them in the Cloud Run Web Console page. Please also remember to delete other Google Cloud resources you may have used.

#Delete the ADK agent Cloud Run service:
gcloud run services delete production-adk-agent -region europe-west1
# Delete the Gemma backend Cloud Run service: 
gcloud run services delete ollama-gemma3-4b-gpu --region europe-west1

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
adk-agent		adk-agent
ollama-backend		ollama-backend
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cloud Run GPU Hackathon AI Agent Sample Repo: Deploy Your ADK Agent to Cloud Run with GPU

🏗️ What You'll Build

📋 Prerequisites

🚀 Lab Overview

Part 1: Understanding the Production Agent (10 minutes)

Agent Architecture

Key Components

Prerequisites

Deploy Gemma Backend

Deploy ADK Cloud Run Agent that calls the Gemma Backend

Understanding some env variables:

Environment Variables

Connection Modes

Test Your Agent's health

🎉 Test your Agent with the ADK WebUI

Clean up

About

Uh oh!

Releases

Packages

Languages

dihmandrake/hackathon-cloudrun

Folders and files

Latest commit

History

Repository files navigation

Cloud Run GPU Hackathon AI Agent Sample Repo: Deploy Your ADK Agent to Cloud Run with GPU

🏗️ What You'll Build

📋 Prerequisites

🚀 Lab Overview

Part 1: Understanding the Production Agent (10 minutes)

Agent Architecture

Key Components

Prerequisites

Deploy Gemma Backend

Deploy ADK Cloud Run Agent that calls the Gemma Backend

Understanding some env variables:

Environment Variables

Connection Modes

Test Your Agent's health

🎉 Test your Agent with the ADK WebUI

Clean up

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages