Skip to main content

Mini Cloud Platform β€” Bare-Metal Infrastructure

Private datacenter-equivalent infrastructure β€” built from scratch with MAAS, k3s, ArgoCD, and GitOps.


What This Project Is​

This documentation covers a complete bare-metal infrastructure built locally using MAAS (Metal as a Service), provisioning a 3-node cluster ready for Kubernetes and production workloads.

Equivalent to: AWS EC2 + VPC + Auto Provisioning β€” but local.

Infrastructure at a Glance​

NodeIPRoleHardware
set-hog10.0.0.2Control PlaneThinkPad T15 Gen 1
fast-skunk10.0.0.4WorkerThinkPad T490
fast-heron10.0.0.7WorkerThinkPad T490

MAAS Controller: Ubuntu + dual NIC (WiFi β†’ internet, Ethernet β†’ 10.0.0.1)


Complete Roadmap​

Each phase builds directly on the previous one β€” nothing requires something that hasn't been set up yet.

PhaseTopicKey TechnologyStatus
0MAAS + 3-node provisioningMAAS, PXE, cloud-initβœ… Done
1Kubernetes clusterk3sβœ… Done
2kubectl local accesskubeconfigβœ… Done
3Remote access from anywhereTailscale, Cloudflare Tunnel, Homerβœ… Done
4Load balancer IPs on bare-metalMetalLBβœ… Done
5Persistent storageLonghorn, NFSβœ… Done
6Expose apps to the networkF5 NGINX Ingressβœ… Done
7Private container registryHarbor + Trivyβœ… Done
8Cluster monitoringPrometheus, Grafanaβœ… Done
9First real workloadpodinfo, HPA, ServiceMonitorβœ… Done
10Infrastructure automationAnsibleβœ… Done
11Infrastructure as CodeOpenTofu (MAAS) β€” Crossplane deferredβœ… Done
12GitOps deploymentArgoCD (App-of-Apps)βœ… Done
13CI/CD pipelinesGitHub Actions + ghcr.io + ArgoCD image promotionβœ… Done
14Backup & disaster recoveryVelero + MinIO on controller + hourly k3s SQLite pullβœ… Done
15TLS / cert-manager (internal PKI) β€” Vault + RBAC deferredcert-manager, self-signed root CAβœ… Done
16Harbor as Sovereign Registry β€” 4 proxy-cache projects, mirror+fallback, supply-chain control point. Original n8n/Temporal/Airflow plan deferred.Harbor proxy cacheβœ… Done
17Event-driven autoscaling β€” KEDA + NATS JetStream HA, scale-to-zero verified end-to-endKEDA, NATSβœ… Done
18Backstage minimal IDP β€” catalog-only, off-the-shelf image, Vault/plugins/templates deferredBackstageβœ… Done
19Self-hosted AI β€” Ollama (CPU, llama3.2:3b, ~13 TPS) + Open WebUI chat. MLflow + Kubeflow deferred.Ollama, Open WebUIβœ… Done
20Reliability & chaos engineering β€” 3 validation experiments on podinfo: PodChaos (0 ms downtime under 5 pod kills), NetworkChaos (200 ms latency injection + clean recovery), StressChaos (contained cgroup OOM, 0 node-mate restarts). NodeChaos / dashboard Ingress / automated GameDays deferred.Chaos Meshβœ… Done
21Logs (Loki single-binary, Promtail DaemonSet, Grafana datasource) + Alertmanager 3-tier routing tree + in-cluster webhook receiver + custom PodinfoAvailabilityLost rule. End-to-end alert validated via Chaos Mesh kill-both-replicas β†’ webhook receives FIRING JSON. Jaeger / distributed tracing deferred β€” no multi-service topology to trace.Loki, Promtail, Alertmanagerβœ… Done
22eBPF networking β€” migration runbook authored, execution deferred to fresh-cluster rebuild. cilium CLI installed on controller; dry-run helm values captured. Senior scope-reduction call: 111 live pods + 22 phases of validated infrastructure on top of Flannel make hot CNI swap not worth it at our cluster scale.Cilium, Hubbleβœ… Done
β€”Data LayerKafka/Redpanda, ClickHouse, dbt, Superset, OpenMetadataπŸ”œ
β€”Security LayerKeycloak, OPA/Gatekeeper, Falco, Cosign+SBOM, kube-benchπŸ”œ

Final Stack (When Complete)​

── INFRASTRUCTURE ──────────────────────────────────────────────────
MAAS β†’ bare-metal provisioning
k3s β†’ Kubernetes cluster
MetalLB β†’ load balancer IPs
Longhorn β†’ distributed storage
Harbor β†’ private container registry

── AUTOMATION & DELIVERY ───────────────────────────────────────────
Ansible β†’ infrastructure automation
Terraform β†’ infrastructure as code
Crossplane β†’ Kubernetes-native IaC
ArgoCD β†’ GitOps
GitLab β†’ CI/CD

── PLATFORM SERVICES ───────────────────────────────────────────────
Velero β†’ backup & disaster recovery
Vault β†’ secrets management
n8n β†’ visual workflow automation
Temporal β†’ code-based workflow orchestration
Airflow β†’ data pipeline scheduling
KEDA β†’ event-driven autoscaling
NATS β†’ message broker
Backstage β†’ developer portal

── OBSERVABILITY ───────────────────────────────────────────────────
Prometheus β†’ metrics
Grafana β†’ dashboards
Loki β†’ logs
Jaeger β†’ traces
Chaos Mesh β†’ reliability testing
Cilium β†’ eBPF networking + Hubble
Tailscale β†’ remote access VPN

── DATA LAYER ──────────────────────────────────────────────────────
Redpanda β†’ event streaming (Kafka-compatible)
Debezium β†’ change data capture (CDC)
ClickHouse β†’ columnar analytics warehouse
dbt β†’ SQL transformation layer
Superset β†’ self-hosted BI dashboards
OpenMetadata β†’ data catalog, lineage, governance

── SECURITY LAYER ──────────────────────────────────────────────────
Keycloak β†’ SSO / OIDC identity provider
OPA/Gatekeeper→ admission control (policy as code)
Falco β†’ runtime threat detection (eBPF)
Cosign β†’ image signing + SBOM supply chain
kube-bench β†’ CIS compliance scoring

── AI / ML ─────────────────────────────────────────────────────────
Ollama β†’ local LLMs (Mistral, LLaMA 3)
MLflow β†’ ML experiment tracking
Kubeflow β†’ ML pipelines + distributed training

CV / LinkedIn Summary​

  • Designed and deployed a 3-node bare-metal infrastructure using MAAS
  • Implemented PXE-based automated OS provisioning via network boot (PXE)
  • Built isolated cluster network (10.0.0.0/24) with DHCP/DNS management
  • Resolved complex networking issues (IPv6 conflicts, DHCP overlap, alias interfaces)
  • Deployed full Kubernetes platform: k3s, ArgoCD, Prometheus, Harbor, Vault
  • Built private AI platform with local LLM serving (Ollama) and ML pipelines (Kubeflow)
  • Implemented remote access via Tailscale VPN and Cloudflare Tunnel
  • Applied chaos engineering with Chaos Mesh to validate cluster resilience
  • Built end-to-end data platform: Kafka/Redpanda β†’ ClickHouse β†’ dbt β†’ Superset with OpenMetadata governance
  • Implemented enterprise security: Keycloak SSO, OPA/Gatekeeper admission control, Falco runtime detection, Cosign supply chain signing