This is part 4 of a four-part series. Index · Part 1 (cluster) · Part 2 (Flux + SOPS) · Part 3 (Tailscale).
By the time I got to this part, the hard work was done. The whole “deploy Grafana” thing took an evening and most of that was me reading documentation for fun. The hard bits were everything I had to put in place before this. That’s the recurring lesson of this whole series — the visible step (a dashboard at a nice URL) is the tip of a fairly tall iceberg.
The Helm release
I’m using kube-prometheus-stack, the prometheus-community chart that bundles Prometheus, Grafana, Alertmanager, the operator, and a sensible set of default dashboards and rules in one go. The Helm release looks like this:
There are three deliberate choices in there worth calling out.
Version pin as a range.>=85.0.0 <86.0.0 lets patch releases flow through automatically — I want bug fixes without having to think about them — but stops minor bumps. Minor versions of kube-prometheus-stack have a habit of renaming values, and I’d rather find out about that on a Saturday than at 11pm.
...SelectorNilUsesHelmValues: false. This is the most useful of the four lines in that block. By default Prometheus only discovers ServiceMonitor / PodMonitor resources that match the Helm release’s own label selector. Setting these to false makes Prometheus pick up anyServiceMonitor in any namespace. That means future apps can ship their own ServiceMonitor next to their Deployment and they’ll be scraped automatically, with no changes to this file. Future-me will appreciate that future-me does not need to remember to touch this file every time future-me adds a new workload.
Storage on NFS via democratic-csi. This is where the work in Part 3 pays off. Grafana, Prometheus, and Alertmanager all get their PVCs from the truenas-nfs StorageClass, which means dashboards and metrics survive any pod restart, any node reboot, and any cluster rebuild (as long as the PVs stay around). The Retain reclaim policy on that StorageClass also means a typo’d kubectl delete pvc won’t silently take 30 days of metrics with it.
The Tailscale Ingress
The kube-prometheus-stack chart can manage its own Grafana Ingress, but I wanted to use the Tailscale Kubernetes Operator to expose it as its own Tailnet device. So I disabled the chart’s ingress and added one of my own:
The Tailscale operator picks this up, provisions a Tailnet device called grafana (tagged tag:k8s thanks to the operator config), and routes HTTPS traffic from that device to the Grafana Service on port 80. From my laptop I open https://grafana.<tailnet>.ts.net and I’m in.
The thing I like most about this is that there’s no DNS rewrite, no SWAG config, no Let’s Encrypt cert renewal to worry about. MagicDNS handles the name. Tailscale handles the TLS. The operator handles everything in between. If I want to expose Alertmanager too later, it’s another six-line YAML file and nothing else changes.
Secrets, the boring way
The Grafana admin password lives in grafana-admin-secret.sops.yaml, encrypted at rest with SOPS + age (see Part 2). The HelmRelease references it via existingSecret, so the password never appears in any Flux manifest in plaintext. Adding a new secret to a workload follows exactly the same pattern as any other secret in the cluster — and that consistency is, again, the entire point.
The actual reward
Screenshot placeholder: the Grafana home dashboard, showing the default kube-prometheus-stack landing page with cluster CPU, memory, and network panels, plus the navigation sidebar.
Screenshot placeholder: a “Cluster / Compute Resources / Pod” dashboard with a couple of pod-level CPU and memory graphs, taken on a fresh install before any custom dashboards have been added.
That’s it. Out of the box you get:
Cluster-level dashboards (CPU, memory, network, by node and by pod).
A handful of well-built Kubernetes-internal dashboards (apiserver, kubelet, scheduler, etc).
Default alerting rules for things like “node is unreachable” or “PVC is filling up”.
It’s the kind of monitoring you’d pay a SaaS for at work, running on a Mini PC in someone else’s spare room, defined entirely in the same git repo as the rest of the homelab.
What’s next
The repo’s README.md has the full backlog, but the things at the top of it are:
Add Renovate Bot so HelmRelease / chart versions update automatically (with a PR I can review) rather than via me occasionally running helm search repo ... --versions.
Get a Home Assistant stack onto the cluster.
Replace the snapshot-of-snapshots backup strategy with a real 3-2-1 setup (restic → Backblaze B2).
Set up Alertmanager notifications to something I actually read. Probably a Telegram bot.
A short reflection
The thing nobody warns you about with homelabs is that the operational maturity creeps up on you. There was no single moment where I “graduated” from running Docker Compose by hand to running a real GitOps platform with encrypted secrets, dynamic storage, monitoring, and a sane network model. It happened one weekend evening at a time, over months.
If you’re at the start of that, the advice I’d give myself is: don’t try to do all of it before you do any of it. Get the cluster up. Get one workload deployed via Flux. Move on. The Tailscale tag rework, the SOPS hook, the democratic-csi-controller egress trick, the Immich migration — all of those are things I’d have happily done as one-evening side quests if I’d known they were one-evening side quests. They only feel insurmountable when you try to plan them all at once.
Future-me, if you’re reading this because Grafana is broken again: it’s probably the truenas-nfs PVC. Or the operator forgot to provision the Ingress device. Or you renamed something in the ACL. Good luck.
---title: "Finally, Grafana (Part 4)"date: 2026-05-23tags: [homelab, kubernetes, grafana, prometheus, monitoring, tailscale]summary: > The payoff. kube-prometheus-stack via HelmRelease, NFS-backed PVs for dashboards and metrics, a Tailscale Ingress for a nice URL, and one last note on the version-pinning policy I settled on.---# Finally, Grafana (Part 4)This is part 4 of a four-part series. [Index][index] · [Part 1][part1](cluster) · [Part 2][part2] (Flux + SOPS) · [Part 3][part3] (Tailscale).By the time I got to this part, the hard work was done. The whole"deploy Grafana" thing took an evening and most of that was mereading documentation for fun. The hard bits were everything Ihad to put in place *before* this. That's the recurring lesson ofthis whole series — the visible step (a dashboard at a nice URL)is the tip of a fairly tall iceberg.## The Helm releaseI'm using [`kube-prometheus-stack`][kps], theprometheus-community chart that bundles Prometheus, Grafana,Alertmanager, the operator, and a sensible set of defaultdashboards and rules in one go. The Helm release looks like this:```yamlapiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata:name: kube-prometheus-stacknamespace: monitoringspec:interval: 30mchart:spec:chart: kube-prometheus-stackversion:">=85.0.0 <86.0.0" # Patch releases auto; minor bumps manualsourceRef:kind: HelmRepositoryname: prometheus-communitynamespace: flux-systeminterval: 24hinstall:{crds: Create,remediation:{retries:3}}upgrade:{crds: CreateReplace,remediation:{retries:3}}values:grafana:admin:existingSecret: grafana-admin-secret # SOPS-encrypteduserKey: adminUserpasswordKey: adminPasswordingress:{enabled:false} # Handled separately, belowpersistence:enabled:truestorageClassName: truenas-nfssize: 5Giprometheus:prometheusSpec:retention: 30dretentionSize:"45GiB"storageSpec:volumeClaimTemplate:spec:storageClassName: truenas-nfsaccessModes:["ReadWriteOnce"]resources:{requests:{storage: 50Gi }} # Discover ServiceMonitors / PodMonitors in ALL namespacesserviceMonitorSelectorNilUsesHelmValues:falsepodMonitorSelectorNilUsesHelmValues:falseruleSelectorNilUsesHelmValues:falseprobeSelectorNilUsesHelmValues:falsealertmanager:alertmanagerSpec:storage:volumeClaimTemplate:spec:storageClassName: truenas-nfsaccessModes:["ReadWriteOnce"]resources:{requests:{storage: 1Gi }}```There are three deliberate choices in there worth calling out.**Version pin as a range.** `>=85.0.0 <86.0.0` lets patch releasesflow through automatically — I want bug fixes without having tothink about them — but stops minor bumps. Minor versions ofkube-prometheus-stack have a habit of renaming values, and I'drather find out about that on a Saturday than at 11pm.**`...SelectorNilUsesHelmValues: false`.** This is the mostuseful of the four lines in that block. By default Prometheusonly discovers `ServiceMonitor` / `PodMonitor` resources thatmatch the Helm release's own label selector. Setting these to`false` makes Prometheus pick up *any* `ServiceMonitor` in anynamespace. That means future apps can ship their own`ServiceMonitor` next to their `Deployment` and they'll bescraped automatically, with no changes to this file. Future-mewill appreciate that future-me does not need to remember totouch this file every time future-me adds a new workload.**Storage on NFS via `democratic-csi`.** This is where the workin [Part 3][part3] pays off. Grafana, Prometheus, andAlertmanager all get their PVCs from the `truenas-nfs`StorageClass, which means dashboards and metrics survive any podrestart, any node reboot, and any cluster rebuild (as long as thePVs stay around). The `Retain` reclaim policy on thatStorageClass also means a typo'd `kubectl delete pvc` won'tsilently take 30 days of metrics with it.## The Tailscale IngressThe `kube-prometheus-stack` chart can manage its ownGrafana `Ingress`, but I wanted to use the[Tailscale Kubernetes Operator][tsop] to expose it as its ownTailnet device. So I disabled the chart's ingress and added oneof my own:```yaml# kubernetes/apps/monitoring/grafana-ingress.yamlapiVersion: networking.k8s.io/v1kind: Ingressmetadata:name: grafananamespace: monitoringspec:ingressClassName: tailscalerules:-http:paths:-path: /pathType: Prefixbackend:service:name: kube-prometheus-stack-grafanaport:{number:80}tls:-hosts:[grafana]```The Tailscale operator picks this up, provisions a Tailnet devicecalled `grafana` (tagged `tag:k8s` thanks to the operatorconfig), and routes HTTPS traffic from that device to theGrafana Service on port 80. From my laptop I open`https://grafana.<tailnet>.ts.net` and I'm in.The thing I like most about this is that there's no DNS rewrite,no SWAG config, no Let's Encrypt cert renewal to worry about.MagicDNS handles the name. Tailscale handles the TLS. Theoperator handles everything in between. If I want to exposeAlertmanager too later, it's another six-line YAML file andnothing else changes.## Secrets, the boring wayThe Grafana admin password lives in `grafana-admin-secret.sops.yaml`,encrypted at rest with SOPS + age (see [Part 2][part2]). TheHelmRelease references it via `existingSecret`, so the passwordnever appears in any Flux manifest in plaintext. Adding a newsecret to a workload follows exactly the same pattern as anyother secret in the cluster — and that consistency is, again,the entire point.## The actual reward> > That's it. Out of the box you get:- Cluster-level dashboards (CPU, memory, network, by node and by pod).- A handful of well-built Kubernetes-internal dashboards (apiserver, kubelet, scheduler, etc).- Default alerting rules for things like "node is unreachable" or "PVC is filling up".It's the kind of monitoring you'd pay a SaaS for at work, runningon a Mini PC in someone else's spare room, defined entirely inthe same git repo as the rest of the homelab.## What's nextThe repo's [`README.md`][readme] has the full backlog, but thethings at the top of it are:- Add Renovate Bot so HelmRelease / chart versions update automatically (with a PR I can review) rather than via me occasionally running `helm search repo ... --versions`.- Get a Home Assistant stack onto the cluster.- Replace the snapshot-of-snapshots backup strategy with a real 3-2-1 setup (restic → Backblaze B2).- Set up Alertmanager notifications to something I actually read. Probably a Telegram bot.## A short reflectionThe thing nobody warns you about with homelabs is that the*operational* maturity creeps up on you. There was no singlemoment where I "graduated" from running Docker Compose by handto running a real GitOps platform with encrypted secrets, dynamicstorage, monitoring, and a sane network model. It happened oneweekend evening at a time, over months.If you're at the start of that, the advice I'd give myself is:**don't try to do all of it before you do any of it.** Get thecluster up. Get one workload deployed via Flux. Move on. TheTailscale tag rework, the SOPS hook, the`democratic-csi`-controller egress trick, the Immich migration —all of those are things I'd have happily done as one-eveningside quests if I'd known they were one-evening side quests. Theyonly feel insurmountable when you try to plan them all at once.Future-me, if you're reading this because Grafana is brokenagain: it's probably the `truenas-nfs` PVC. Or the operatorforgot to provision the Ingress device. Or you renamed somethingin the ACL. Good luck.[index]: ./2026-05-23-homelab-k8s-journey.md[part1]: ./2026-05-23-from-truenas-to-kubernetes.md[part2]: ./2026-05-23-flux-and-sops.md[part3]: ./2026-05-23-tailscale-the-saga.md[kps]: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack[tsop]: https://tailscale.com/kb/1236/kubernetes-operator[readme]: ../../README.md