Phase 17 β KEDA β Event-Driven Autoscaling
Phase 9 demonstrated CPU-based scaling with podinfo (HPA target: 50% CPU). Phase 17 demonstrates the next step: event-driven scaling, where pods come into existence because a real signal fires β JetStream consumer lag, queue depth, message rate β rather than CPU.
The hero feature is scale-to-zero: an idle worker Deployment with
replicas: 0, costing zero RAM, becomes 5 pods in seconds when 50
messages arrive, drains the queue, scales back to 0. No manual button-
pushing, no wasteful idle pod, no cron schedule that doesn't match
actual load.
KEDA is paired with NATS JetStream in this phase (see
Phase 17 β NATS). The companion deliverable is the
event-demo workload deployed via the gitops repo.
What KEDA doesβ
KEDA is a Kubernetes-native operator that watches ScaledObject Custom
Resources. Each ScaledObject links a Deployment to one or more
triggers (NATS, Kafka, RabbitMQ, Prometheus, AWS SQS, cron, etc.).
When a trigger reports activity, KEDA creates a standard
HorizontalPodAutoscaler behind the scenes targeting that Deployment β
extending HPA to scale on signals beyond CPU/memory.
The minReplicaCount can be 0 (scale-to-zero), unlike standard HPA which has a 1-replica floor.
ScaledObject (CR)
β
β KEDA operator watches
βΌ
HPA (auto-generated)
β
β targets
βΌ
Deployment (replicas controlled by HPA)
Architecture (this cluster)β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β event-demo namespace (ArgoCD-managed via minicloud-gitops) β
β β
β ββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β ScaledObject β βββββββββββββββΆβ HPA keda-hpa-... β β
β β echo-worker β β min:0 max:5 β β
β β trigger: β β (auto-generated) β β
β β nats-jetstream β ββββββββββββ¬ββββββββββββ β
β β lagThreshold:10 β β β
β βββββββββββ¬βββββββββββ βΌ β
β β βββββββββββββββββββββββ β
β β polls β Deployment β β
β β every 5s β echo-worker β β
β βΌ β replicas: 0..5 β β
β ββββββββββββββββββββββ βββββββββββββββββββββββ β
β β KEDA operator β β
β β (keda namespace) β β
β βββββββββββ¬βββββββββββ β
β β HTTP GET β
β βΌ β
β http://nats-headless.messaging:8222/jsz?...&consumers=true β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Decisionsβ
| Decision | Choice | Rationale |
|---|---|---|
| Install method | Helm chart kedacore/keda v2.19.0 | Standard install, auto-creates CRDs |
minReplicaCount | 0 (scale-to-zero) | Demonstrates the canonical "no idle pods cost RAM" pattern |
maxReplicaCount | 5 | Within cluster headroom; aligns with the 5-worker burst-test target |
pollingInterval | 5s (default 30s) | Visible scale-up speed for demo; production typically uses 30s |
cooldownPeriod | 60s (default 300s) | Faster scale-down for demo viewing; production keeps 300s to avoid thrashing |
| Trigger type | nats-jetstream | Matches Phase 17's NATS install; the canonical KEDA-NATS integration |
Trigger metadata natsServerMonitoringEndpoint | nats-headless.messaging:8222 | Important: KEDA's NATS scaler queries <pod>.<endpoint>:8222 per pod (e.g., nats-0.nats-headless.messaging:8222). Requires a headless Service for per-pod DNS. The chart's plain nats Service only exposes 4222. |
lagThreshold | 10 | 1 worker per 10 pending messages β queue of 50 β 5 workers (matches maxReplicaCount) |
activationLagThreshold | 0 | Any pending message > 0 wakes from zero (vs. waiting for 10 to accumulate) |
Pre-flightβ
helm repo add kedacore https://kedacore.github.io/charts
helm repo update kedacore
helm search repo kedacore/keda # confirm v2.19+
Install KEDAβ
keda-values.yaml:
resources:
operator:
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
metricServer:
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }
webhooks:
requests: { cpu: 25m, memory: 64Mi }
limits: { cpu: 100m, memory: 128Mi }
# Wire all 3 components into kube-prometheus-stack via ServiceMonitors.
# The release label is what kube-prometheus-stack's serviceMonitorSelector
# picks up (set in Phase 8 to be wide-open via serviceMonitorSelectorNilUsesHelmValues=false).
prometheus:
metricServer:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
operator:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
webhooks:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
kubectl create namespace keda
helm install keda kedacore/keda -n keda -f keda-values.yaml --wait --timeout 5m
kubectl get pods -n keda
# 3 pods Running: keda-operator, keda-operator-metrics-apiserver, keda-admission-webhooks
ScaledObject for the demo workloadβ
apps/event-demo.yaml (ArgoCD Application) points at
manifests/event-demo/02-scaledobject.yaml in the gitops repo:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: echo-worker
namespace: event-demo
spec:
scaleTargetRef:
name: echo-worker # the Deployment to scale
minReplicaCount: 0
maxReplicaCount: 5
pollingInterval: 5
cooldownPeriod: 60
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: "nats-headless.messaging:8222"
account: "$G"
stream: JOBS
consumer: echo-workers
lagThreshold: "10"
activationLagThreshold: "0"
End-to-end burst testβ
The headline test: from a clean idle state (replicas=0), publish 50 messages and observe scale up + drain + scale to zero.
NATS_BOX=$(kubectl get pods -n messaging --no-headers | grep nats-box | awk '{print $1}')
# Idle: 0 pods, queue empty
kubectl get pods -n event-demo
# (no resources)
# Publish 50 messages in a tight loop inside the nats-box pod
T0=$(date +%s%3N)
kubectl exec -n messaging $NATS_BOX -- sh -c '
for i in $(seq 1 50); do
nats --server nats://nats.messaging:4222 pub jobs.echo "msg-$i" >/dev/null 2>&1
done
'
# Watch scale-up
kubectl get scaledobject,pods -n event-demo -w
Observed timelineβ
| Event | Time relative to T0 |
|---|---|
| 50 messages published | t+1.8s |
| First worker pod Running | t+1.8s (image cache-warm in containerd) |
| All 5 workers Running | t+33s |
| Queue fully drained | t+165s (~5s per message average Γ 5 parallel workers) |
| Scale-down begins (cooldown elapsed) | t+212s |
| All pods terminated | t+225s |
The "first pod Running in ~1.8s" number assumes the worker image is already cached on the target node's containerd. On a true cold start (image not yet on the node), add ~3-5s for the cache-warm pull through Harbor's docker-hub proxy (Phase 16).
What you can verify after the testβ
# Stream sequence advanced from 1 to 50 (or wherever previous tests left it)
nats stream info JOBS
# Consumer ack floor matches stream sequence β all messages were acked, no redeliveries
nats consumer info JOBS echo-workers
# num_pending: 0
# num_redelivered: 0
# ack_floor.consumer_seq == delivered.consumer_seq
# Workqueue retention: acked messages are physically removed (state.messages: 0)
Other KEDA trigger types worth knowingβ
The same ScaledObject shape works with other triggers β useful when
adding more event-driven workloads later:
# Cron-based scaling
- type: cron
metadata:
timezone: "Europe/Paris"
start: "0 8 * * 1-5"
end: "0 20 * * 1-5"
desiredReplicas: "5"
# Prometheus query (scale on any metric in Phase 8's Prometheus)
- type: prometheus
metadata:
serverAddress: http://kps-prometheus.monitoring:9090
metricName: http_requests_per_second
threshold: "100"
query: sum(rate(podinfo_http_requests_total[1m]))
# Kafka lag (when we install Kafka in the data-layer phases)
- type: kafka
metadata:
bootstrapServers: kafka.data-platform:9092
consumerGroup: my-app
topic: events
lagThreshold: "100"
Verification (regression)β
# 3 KEDA pods Running
kubectl get pods -n keda
# Expected: keda-admission-webhooks, keda-operator, keda-operator-metrics-apiserver β all 1/1 Running
# CRDs registered
kubectl get crd | grep keda.sh
# Expected at minimum: scaledobjects.keda.sh, scaledjobs.keda.sh, triggerauthentications.keda.sh
# event-demo ScaledObject Ready
kubectl get scaledobject -n event-demo
# Expected: READY=True, ACTIVE depends on whether messages are pending
Done Whenβ
β 3 KEDA pods Running in keda namespace
β ScaledObjects + ScaledJobs CRDs registered
β event-demo ScaledObject reports READY=True
β Burst test: 50 msg β first pod Running in <5s β 5 pods peak β drain β 0 pods
β Workqueue retention verified: acked messages physically removed
β Both KEDA and NATS metrics visible in Prometheus Targets
Real-world skills demonstratedβ
| Skill | Industry context |
|---|---|
| Scale-to-zero | The defining feature of "serverless on Kubernetes" β same pattern Knative, OpenFaaS, AWS Lambda use |
| Event-driven autoscaling | The next step after CPU-based HPA; required for any workload whose load doesn't correlate with CPU (queue workers, batch jobs, schedulers) |
ScaledObject declarative scaling | KEDA's API surface; same shape on any cluster running KEDA β Bunq, Walmart, Microsoft Azure all use it in production |
| NATS-headless service for per-pod metrics queries | Real KEDA-NATS deployment knowledge: a regular ClusterIP doesn't give per-pod DNS; KEDA's scaler discovers individual JetStream cluster members |
activationLagThreshold for wake-from-zero | Distinguishes "any work to do" (activation) from "should we add another worker" (scaling). Subtle but critical config. |
pollingInterval and cooldownPeriod tuning | Demo settings (5s/60s) vs. production (30s/300s). Same trade-off every shop tunes. |
| ServiceMonitor wiring across components | KEDA exports metrics from 3 different deployments; each one's labels need to align with kube-prometheus-stack's selector |