Skip to main content

Phase 17 β€” KEDA β€” Event-Driven Autoscaling

Phase 9 demonstrated CPU-based scaling with podinfo (HPA target: 50% CPU). Phase 17 demonstrates the next step: event-driven scaling, where pods come into existence because a real signal fires β€” JetStream consumer lag, queue depth, message rate β€” rather than CPU.

The hero feature is scale-to-zero: an idle worker Deployment with replicas: 0, costing zero RAM, becomes 5 pods in seconds when 50 messages arrive, drains the queue, scales back to 0. No manual button- pushing, no wasteful idle pod, no cron schedule that doesn't match actual load.

KEDA is paired with NATS JetStream in this phase (see Phase 17 β€” NATS). The companion deliverable is the event-demo workload deployed via the gitops repo.


What KEDA does​

KEDA is a Kubernetes-native operator that watches ScaledObject Custom Resources. Each ScaledObject links a Deployment to one or more triggers (NATS, Kafka, RabbitMQ, Prometheus, AWS SQS, cron, etc.). When a trigger reports activity, KEDA creates a standard HorizontalPodAutoscaler behind the scenes targeting that Deployment β€” extending HPA to scale on signals beyond CPU/memory.

The minReplicaCount can be 0 (scale-to-zero), unlike standard HPA which has a 1-replica floor.

ScaledObject (CR)
β”‚
β”‚ KEDA operator watches
β–Ό
HPA (auto-generated)
β”‚
β”‚ targets
β–Ό
Deployment (replicas controlled by HPA)

Architecture (this cluster)​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ event-demo namespace (ArgoCD-managed via minicloud-gitops) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ ScaledObject β”‚ ──────────────▢│ HPA keda-hpa-... β”‚ β”‚
β”‚ β”‚ echo-worker β”‚ β”‚ min:0 max:5 β”‚ β”‚
β”‚ β”‚ trigger: β”‚ β”‚ (auto-generated) β”‚ β”‚
β”‚ β”‚ nats-jetstream β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ lagThreshold:10 β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ polls β”‚ Deployment β”‚ β”‚
β”‚ β”‚ every 5s β”‚ echo-worker β”‚ β”‚
β”‚ β–Ό β”‚ replicas: 0..5 β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ KEDA operator β”‚ β”‚
β”‚ β”‚ (keda namespace) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ HTTP GET β”‚
β”‚ β–Ό β”‚
β”‚ http://nats-headless.messaging:8222/jsz?...&consumers=true β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Decisions​

DecisionChoiceRationale
Install methodHelm chart kedacore/keda v2.19.0Standard install, auto-creates CRDs
minReplicaCount0 (scale-to-zero)Demonstrates the canonical "no idle pods cost RAM" pattern
maxReplicaCount5Within cluster headroom; aligns with the 5-worker burst-test target
pollingInterval5s (default 30s)Visible scale-up speed for demo; production typically uses 30s
cooldownPeriod60s (default 300s)Faster scale-down for demo viewing; production keeps 300s to avoid thrashing
Trigger typenats-jetstreamMatches Phase 17's NATS install; the canonical KEDA-NATS integration
Trigger metadata natsServerMonitoringEndpointnats-headless.messaging:8222Important: KEDA's NATS scaler queries <pod>.<endpoint>:8222 per pod (e.g., nats-0.nats-headless.messaging:8222). Requires a headless Service for per-pod DNS. The chart's plain nats Service only exposes 4222.
lagThreshold101 worker per 10 pending messages β†’ queue of 50 β†’ 5 workers (matches maxReplicaCount)
activationLagThreshold0Any pending message > 0 wakes from zero (vs. waiting for 10 to accumulate)

Pre-flight​

helm repo add kedacore https://kedacore.github.io/charts
helm repo update kedacore
helm search repo kedacore/keda # confirm v2.19+

Install KEDA​

keda-values.yaml:

resources:
operator:
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
metricServer:
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
webhooks:
requests: { cpu: 25m, memory: 64Mi }
limits: { cpu: 100m, memory: 128Mi }

# Wire all 3 components into kube-prometheus-stack via ServiceMonitors.
# The release label is what kube-prometheus-stack's serviceMonitorSelector
# picks up (set in Phase 8 to be wide-open via serviceMonitorSelectorNilUsesHelmValues=false).
prometheus:
metricServer:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
operator:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
webhooks:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
kubectl create namespace keda
helm install keda kedacore/keda -n keda -f keda-values.yaml --wait --timeout 5m

kubectl get pods -n keda
# 3 pods Running: keda-operator, keda-operator-metrics-apiserver, keda-admission-webhooks

ScaledObject for the demo workload​

apps/event-demo.yaml (ArgoCD Application) points at manifests/event-demo/02-scaledobject.yaml in the gitops repo:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: echo-worker
namespace: event-demo
spec:
scaleTargetRef:
name: echo-worker # the Deployment to scale
minReplicaCount: 0
maxReplicaCount: 5
pollingInterval: 5
cooldownPeriod: 60
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: "nats-headless.messaging:8222"
account: "$G"
stream: JOBS
consumer: echo-workers
lagThreshold: "10"
activationLagThreshold: "0"

End-to-end burst test​

The headline test: from a clean idle state (replicas=0), publish 50 messages and observe scale up + drain + scale to zero.

NATS_BOX=$(kubectl get pods -n messaging --no-headers | grep nats-box | awk '{print $1}')

# Idle: 0 pods, queue empty
kubectl get pods -n event-demo
# (no resources)

# Publish 50 messages in a tight loop inside the nats-box pod
T0=$(date +%s%3N)
kubectl exec -n messaging $NATS_BOX -- sh -c '
for i in $(seq 1 50); do
nats --server nats://nats.messaging:4222 pub jobs.echo "msg-$i" >/dev/null 2>&1
done
'

# Watch scale-up
kubectl get scaledobject,pods -n event-demo -w

Observed timeline​

EventTime relative to T0
50 messages publishedt+1.8s
First worker pod Runningt+1.8s (image cache-warm in containerd)
All 5 workers Runningt+33s
Queue fully drainedt+165s (~5s per message average Γ— 5 parallel workers)
Scale-down begins (cooldown elapsed)t+212s
All pods terminatedt+225s

The "first pod Running in ~1.8s" number assumes the worker image is already cached on the target node's containerd. On a true cold start (image not yet on the node), add ~3-5s for the cache-warm pull through Harbor's docker-hub proxy (Phase 16).

What you can verify after the test​

# Stream sequence advanced from 1 to 50 (or wherever previous tests left it)
nats stream info JOBS

# Consumer ack floor matches stream sequence β€” all messages were acked, no redeliveries
nats consumer info JOBS echo-workers
# num_pending: 0
# num_redelivered: 0
# ack_floor.consumer_seq == delivered.consumer_seq

# Workqueue retention: acked messages are physically removed (state.messages: 0)

Other KEDA trigger types worth knowing​

The same ScaledObject shape works with other triggers β€” useful when adding more event-driven workloads later:

# Cron-based scaling
- type: cron
metadata:
timezone: "Europe/Paris"
start: "0 8 * * 1-5"
end: "0 20 * * 1-5"
desiredReplicas: "5"

# Prometheus query (scale on any metric in Phase 8's Prometheus)
- type: prometheus
metadata:
serverAddress: http://kps-prometheus.monitoring:9090
metricName: http_requests_per_second
threshold: "100"
query: sum(rate(podinfo_http_requests_total[1m]))

# Kafka lag (when we install Kafka in the data-layer phases)
- type: kafka
metadata:
bootstrapServers: kafka.data-platform:9092
consumerGroup: my-app
topic: events
lagThreshold: "100"

Verification (regression)​

# 3 KEDA pods Running
kubectl get pods -n keda
# Expected: keda-admission-webhooks, keda-operator, keda-operator-metrics-apiserver β€” all 1/1 Running

# CRDs registered
kubectl get crd | grep keda.sh
# Expected at minimum: scaledobjects.keda.sh, scaledjobs.keda.sh, triggerauthentications.keda.sh

# event-demo ScaledObject Ready
kubectl get scaledobject -n event-demo
# Expected: READY=True, ACTIVE depends on whether messages are pending

Done When​

βœ” 3 KEDA pods Running in keda namespace
βœ” ScaledObjects + ScaledJobs CRDs registered
βœ” event-demo ScaledObject reports READY=True
βœ” Burst test: 50 msg β†’ first pod Running in <5s β†’ 5 pods peak β†’ drain β†’ 0 pods
βœ” Workqueue retention verified: acked messages physically removed
βœ” Both KEDA and NATS metrics visible in Prometheus Targets

Real-world skills demonstrated​

SkillIndustry context
Scale-to-zeroThe defining feature of "serverless on Kubernetes" β€” same pattern Knative, OpenFaaS, AWS Lambda use
Event-driven autoscalingThe next step after CPU-based HPA; required for any workload whose load doesn't correlate with CPU (queue workers, batch jobs, schedulers)
ScaledObject declarative scalingKEDA's API surface; same shape on any cluster running KEDA β€” Bunq, Walmart, Microsoft Azure all use it in production
NATS-headless service for per-pod metrics queriesReal KEDA-NATS deployment knowledge: a regular ClusterIP doesn't give per-pod DNS; KEDA's scaler discovers individual JetStream cluster members
activationLagThreshold for wake-from-zeroDistinguishes "any work to do" (activation) from "should we add another worker" (scaling). Subtle but critical config.
pollingInterval and cooldownPeriod tuningDemo settings (5s/60s) vs. production (30s/300s). Same trade-off every shop tunes.
ServiceMonitor wiring across componentsKEDA exports metrics from 3 different deployments; each one's labels need to align with kube-prometheus-stack's selector