The Tailscale saga (Part 3): one ACL, four broken things
Author
Mateus Harrington
Published
May 23, 2026
The Tailscale saga (Part 3): one ACL, four broken things
This is part 3 of a four-part series. Index · Part 1 brought up the cluster. Part 2 put it under GitOps. This is the part where I tried to give the cluster network access to the rest of the homelab, and accidentally rewrote my Tailscale ACL three times in one evening.
If you only read one part, read this one. The other parts could have been blog posts; this one was an experience.
Diagram placeholder: a network diagram showing the Tailnet as a cloud, with tag:truenas, tag:k8s-node (×3 Talos nodes), tag:k8s (×1 Grafana Ingress), and tag:family-immich (×1 Immich sidecar) as distinct devices on it. Arrows showing the ACL grants between them.
The starting point
Before any of this, my Tailscale model was:
TrueNAS host owned by my user account, visible in the tailnet as truenas.
All my personal devices owned by my user account.
A handful of family members invited as users, with their devices also on the tailnet.
A single ACL that effectively said “members can reach everything members own.”
To share Immich with family, I’d used Tailscale’s node sharing to share the TrueNAS host with each family member. This worked, but it shared everything on the TrueNAS host — including the AdGuard admin page, the *arr stack, the Jellyfin server. The ACL pretended that wasn’t true, but it was a fiction maintained by trust rather than enforcement.
I wanted three things to be true at once:
Kubernetes nodes should be able to mount NFS from TrueNAS.
Family members should be able to reach only Immich — ideally without me having to maintain per-port ACL rules every time I added a service.
Future-me should be able to expose Kubernetes-hosted services (Grafana, eventually Home Assistant) over Tailscale, with sane per-service URLs.
Achieving all three of those at the same time turned out to need four separate fixes. None of them were independently hard. The combination ate an evening.
Stumbling block 1: the tag split
The first thing I tried was: “right, I’ll put a tag on the Talos nodes, give the operator a tag for its Services, and write ACL grants between them.” Easy.
It was not easy. Here’s where I went wrong.
The Tailscale Kubernetes Operator uses an OAuth client to provision Tailnet devices for each Ingress you create. The tag those devices get is set by oauth.defaultTags (in Helm values) or by per-Ingress annotations. I’d set this to tag:k8s, and the OAuth client’s authorised tags also included tag:k8s. Fine so far.
Meanwhile the Talos nodes themselves join the tailnet via the siderolabs/tailscale system extension. Each node runs its own tailscaled and advertises tags via TS_EXTRA_ARGS. I’d originally set this to --advertise-tags=tag:k8s too, on the grounds that “they’re all part of the same cluster, right?”
This is wrong for two reasons:
Talos nodes need to make outbound calls to the TrueNAS host for NFS. Operator-managed Services don’t.
Operator-managed Services need to be reachable inbound from my personal devices. Talos nodes mostly don’t (and shouldn’t — the kubelet isn’t a thing I want exposed).
Lumping them into one tag means the ACL has to grant the union of both directions to both groups. Splitting them lets each get exactly what it needs.
The fix was straightforward once I’d figured it out: rename the nodes’ tag to tag:k8s-node, register it in tagOwners in the ACL, and update the extension config:
(The --accept-dns=false is there because the nodes have their own DNS setup and I don’t want tailscaled overwriting it.)
The grants then become small and obvious:
// Cluster nodes can mount NFS from TrueNAS
{ "src": ["tag:k8s-node"], "dst": ["tag:truenas"],
"ip": ["tcp:2049"] },
// My personal devices can reach operator-managed services
{ "src": ["autogroup:member"], "dst": ["tag:k8s"],
"ip": ["*"] }
I also hit a smaller version of the same problem in the operator Helm values: defaultTags was set to tag:k8s but the OAuth client wasn’t actually authorised to request that tag (an admin mistake I’d made earlier). The operator would create an Ingress device, fail to tag it, and the device would end up untagged and unreachable. Two-line fix in the ACL once I noticed the operator logs were complaining about it. Should have read the logs sooner.
Stumbling block 2: kubelet picked the wrong nodeIP
Right after rolling out the siderolabs/tailscale extension, the cluster broke. flux get all -A started returning timeouts. kubectl logs from my laptop took thirty seconds and then failed.
Here’s what had happened. When tailscale0 came up on each Talos node, kubelet picked the new interface’s IP — a 100.x.y.z Tailscale address — as its nodeIP. Kubernetes node-to-node traffic then tried to route over the tailnet, including via DERP relays when a direct connection wasn’t available. Everything became extremely slow.
…applied with talosctl patch machineconfig, followed by a reboot of each node. Now kubelet picks the LAN IP, node-to-node traffic stays on the LAN, and the tailnet is purely an egress path for NFS and a few other things.
This one is now part of my baseline machine config, so it only ever needs to be applied again if I add a new node or rebuild an existing one. The full procedure is in docs/talos-extensions-rollout.md under “Step 6a”.
The lesson here is one I keep relearning: when you add an interface to a host, something somewhere will probably try to use it for the wrong purpose by default. Always check what kubelet (and routing tables, and /etc/resolv.conf) think the new interface is for.
Stumbling block 3: the democratic-csi controller couldn’t reach TrueNAS
With nodes on the tailnet and ACL grants in place, NFS mounts from the kubelet worked fine. I deployed democratic-csi as the NFS provisioner, pointed it at TrueNAS, and watched it fail to create a single PV.
The issue: democratic-csi has two parts.
Node pods, which run on each Talos node and do the actual mount syscalls. These were fine — kubelet is on the host network, the host can route to TrueNAS over tailscale0.
Controller pod, which talks to the TrueNAS HTTPS API to provision and destroy datasets. The controller was running in the cluster’s pod network, which doesn’t have a route to the tailnet.
The fix needed two things at once:
controller.hostNetwork: true in the democratic-csi Helm values. This puts the controller pod in the node’s network namespace, so it can use tailscale0 directly.
An ExternalName Service annotated for the Tailscale operator, so the driver config doesn’t need to bake in a raw tailnet IP:
The Tailscale operator notices the annotation, sets up an egress proxy for that tailnet IP, and the driver config points at truenas-tailscale.democratic-csi.svc.cluster.local.
Strictly, hostNetwork on its own would have been enough. The ExternalName Service is there so that if the TrueNAS tailnet IP ever changes, I update one annotation rather than re-encrypting the SOPS-encrypted driver config.
This took me embarrassingly long to debug, because the failure mode was “the controller pod is Running and Ready, but no PVs are ever created and the events tab is silent.” The clue is in kubectl logs on the controller pod — but only if you grep specifically for “TrueNAS” or “freenas-api”, since the actual error is a connection timeout buried in a stack trace.
Stumbling block 4: Immich needed to leave home
Solving the family-sharing problem turned out to be the thing that forced the Immich migration, not the other way round.
Here’s the chain of reasoning:
I’d just moved TrueNAS to tag:truenas (so the kubelet ACL grant could be tag-based, not user-based).
Tagged machines cannot be node-shared in Tailscale. (This is documented behaviour — sharing is at the user level; tags replace user ownership.)
That meant my family members lost access to Immich, because the TrueNAS host was no longer share-able.
The cleanest fix was to give Immich its own Tailnet device. To do that, it needed to leave the TrueNAS apps system (which only has one tailscaled, owned by the host) and become a Docker Compose stack with a tailscale/tailscale sidecar in the same Compose network. The sidecar would join the tailnet under tag:family-immich, run tailscale serve to terminate HTTPS at the MagicDNS hostname, and proxy traffic to the immich-server container.
The Compose looks roughly like this (full version in the repo):
network_mode: service: is a trap here. The “obvious” way to set up a tailscale sidecar is to put the app in the sidecar’s network namespace (network_mode: service:immich-family-ts). That works for single-container apps. Immich isn’t one — it needs to resolve database and redis as service-name DNS, which doesn’t work from inside the sidecar’s namespace because the sidecar runs in user-space and doesn’t have Docker’s embedded DNS. The fix is to leave both the app and the sidecar on the Compose network and have the sidecar’s serve.json reach the app by container name. I learned this by spending about an hour staring at “redis: name does not resolve” errors.
Family members get shared just this device. The Tailscale share UI lets me share immich-family with each family member’s account. They see one device in their tailnet called immich-family.<my-tailnet>.ts.net, and that’s it. No AdGuard, no Jellyfin, no TrueNAS UI.
While I was at it I migrated the ImmichFrame slideshow to the same model — its own sidecar, its own Tailnet device, shared to the same family members. That uses network_mode: service: because ImmichFrame is a single container and doesn’t need DNS.
Screenshot placeholder: the Tailscale admin console showing the new model — truenas (tagged), talos-cp1, talos-w1, talos-w2 (tagged k8s-node), grafana (tagged k8s), immich-family (tagged family-immich), my laptop, and a couple of family members’ phones in the shared column.
The Postgres-18 booby trap
The migration came with one final surprise. The upstream Immich Compose template mounts the database volume at /var/lib/postgresql/data. That’s correct for Postgres 14, which is what their template targets.
My inherited data, from the previous TrueNAS Immich app, was on Postgres 18. Postgres 18 changed the on-disk layout: the data directory is expected to be /var/lib/postgresql, with version-numbered subdirectories underneath (e.g. /var/lib/postgresql/18/). Mounting the existing data at /var/lib/postgresql/data produced a “this looks like an empty data directory” message and Postgres helpfully initdb-ed a fresh one — which would have nuked my photos if I’d let it run.
The fix is a one-line change in the Compose:
volumes:- ${DB_DATA_LOCATION}:/var/lib/postgresql # NOT /data
…and a comment block above it that’s about ten times longer than the line itself, because future-me will absolutely forget.
The Docker Hub PR that documents the change is here if you want the upstream rationale. The TL;DR is “always check the data-directory layout when bumping a major Postgres version, even if you didn’t think you were bumping it.”
What you have at the end of Part 3
A Tailscale tag model that’s actually load-bearing, with each tag granted exactly what it needs.
A Kubernetes cluster whose kubelet talks to the LAN, and only the LAN, for node-to-node traffic.
A democratic-csi install that can both mount NFS volumes (kubelet) and talk to the TrueNAS API (controller) over the tailnet.
An Immich (and ImmichFrame) deployment that lives as its own Tailnet device, shareable to family without exposing the rest of the homelab.
This took longer to write than I would like to admit. It also took longer to do than I would like to admit. The thing about a homelab is that nobody is paying you for it, so the only incentive to actually finish the writeup is the suspicion that future-you will need it. Future-me almost certainly will.
---title: "The Tailscale saga (Part 3): one ACL, four broken things"date: 2026-05-23tags: [homelab, kubernetes, tailscale, talos, immich, csi]summary: > The part of the homelab journey where I lost the most hours. The Talos `tailscale` extension, the tag split that took two attempts to get right, the kubelet `nodeIP` problem that nuked Flux, the `democratic-csi` controller that couldn't reach TrueNAS, and the Immich migration that fell out of the same ACL rework.---# The Tailscale saga (Part 3): one ACL, four broken thingsThis is part 3 of a four-part series. [Index][index] · [Part 1][part1]brought up the cluster. [Part 2][part2] put it under GitOps. Thisis the part where I tried to give the cluster network access tothe rest of the homelab, and accidentally rewrote my Tailscale ACLthree times in one evening.If you only read one part, read this one. The other parts couldhave been blog posts; this one was an experience.> ## The starting pointBefore any of this, my Tailscale model was:- TrueNAS host owned by my user account, visible in the tailnet as`truenas`.- All my personal devices owned by my user account.- A handful of family members invited as users, with their devices also on the tailnet.- A single ACL that effectively said "members can reach everything members own."To share Immich with family, I'd used [Tailscale's nodesharing][ts-share] to share the TrueNAS host with each familymember. This worked, but it shared *everything on the TrueNAShost* — including the AdGuard admin page, the *arr stack, theJellyfin server. The ACL pretended that wasn't true, but it was afiction maintained by trust rather than enforcement.I wanted three things to be true at once:1. Kubernetes nodes should be able to mount NFS from TrueNAS.2. Family members should be able to reach **only Immich** — ideally without me having to maintain per-port ACL rules every time I added a service.3. Future-me should be able to expose Kubernetes-hosted services (Grafana, eventually Home Assistant) over Tailscale, with sane per-service URLs.Achieving all three of those at the same time turned out to needfour separate fixes. None of them were independently hard. Thecombination ate an evening.## Stumbling block 1: the tag splitThe first thing I tried was: "right, I'll put a tag on the Talosnodes, give the operator a tag for its Services, and write ACLgrants between them." Easy.It was not easy. Here's where I went wrong.The [Tailscale Kubernetes Operator][tsop] uses an OAuth client toprovision Tailnet devices for each `Ingress` you create. The tagthose devices get is set by `oauth.defaultTags` (in Helm values)or by per-Ingress annotations. I'd set this to `tag:k8s`, and theOAuth client's authorised tags also included `tag:k8s`. Fine sofar.Meanwhile the Talos nodes themselves join the tailnet via the[`siderolabs/tailscale`][ext] system extension. Each node runs itsown `tailscaled` and advertises tags via `TS_EXTRA_ARGS`. I'doriginally set this to `--advertise-tags=tag:k8s` too, on thegrounds that "they're all part of the same cluster, right?"This is wrong for two reasons:- Talos *nodes* need to make outbound calls to the TrueNAS host for NFS. Operator-managed Services don't.- Operator-managed Services need to be reachable *inbound* from my personal devices. Talos nodes mostly don't (and shouldn't — the kubelet isn't a thing I want exposed).Lumping them into one tag means the ACL has to grant the unionof both directions to both groups. Splitting them lets each getexactly what it needs.The fix was straightforward once I'd figured it out: rename thenodes' tag to `tag:k8s-node`, register it in `tagOwners` in theACL, and update the extension config:```yaml# proxmox/tailscale-extension.yamlapiVersion: v1alpha1kind: ExtensionServiceConfigname: tailscaleenvironment:-TS_AUTHKEY=tskey-auth-...-TS_EXTRA_ARGS=--advertise-tags=tag:k8s-node--accept-dns=false```(The `--accept-dns=false` is there because the nodes have theirown DNS setup and I don't want tailscaled overwriting it.)The grants then become small and obvious:```jsonc// Cluster nodes can mount NFS from TrueNAS{ "src": ["tag:k8s-node"],"dst": ["tag:truenas"],"ip": ["tcp:2049"] },// My personal devices can reach operator-managed services{ "src": ["autogroup:member"],"dst": ["tag:k8s"],"ip": ["*"] }```I also hit a smaller version of the same problem in the operatorHelm values: `defaultTags` was set to `tag:k8s` but the OAuthclient wasn't actually *authorised* to request that tag (an adminmistake I'd made earlier). The operator would create an Ingressdevice, fail to tag it, and the device would end up untagged andunreachable. Two-line fix in the ACL once I noticed the operatorlogs were complaining about it. Should have read the logs sooner.## Stumbling block 2: kubelet picked the wrong nodeIPRight after rolling out the `siderolabs/tailscale` extension, thecluster broke. `flux get all -A` started returning timeouts.`kubectl logs` from my laptop took thirty seconds and thenfailed.Here's what had happened. When `tailscale0` came up on each Talosnode, kubelet picked the new interface's IP — a `100.x.y.z`Tailscale address — as its `nodeIP`. Kubernetes node-to-nodetraffic then tried to route over the tailnet, including via DERPrelays when a direct connection wasn't available. Everythingbecame extremely slow.The fix is a one-file patch:```yaml# proxmox/patch-node-ip.yamlmachine:kubelet:nodeIP:validSubnets:-192.168.1.0/24```…applied with `talosctl patch machineconfig`, followed by a rebootof each node. Now kubelet picks the LAN IP, node-to-node trafficstays on the LAN, and the tailnet is *purely* an egress path forNFS and a few other things.This one is now part of my baseline machine config, so it onlyever needs to be applied again if I add a new node or rebuild anexisting one. The full procedure is in[`docs/talos-extensions-rollout.md`][runbook] under "Step 6a".The lesson here is one I keep relearning: when you add aninterface to a host, *something somewhere* will probably try touse it for the wrong purpose by default. Always check whatkubelet (and routing tables, and `/etc/resolv.conf`) think thenew interface is for.## Stumbling block 3: the democratic-csi controller couldn't reach TrueNASWith nodes on the tailnet and ACL grants in place, NFS mounts*from the kubelet* worked fine. I deployed[`democratic-csi`][dcsi] as the NFS provisioner, pointed it atTrueNAS, and watched it fail to create a single PV.The issue: `democratic-csi` has two parts.- **Node** pods, which run on each Talos node and do the actual mount syscalls. These were fine — kubelet is on the host network, the host can route to TrueNAS over `tailscale0`.- **Controller** pod, which talks to the **TrueNAS HTTPS API** to provision and destroy datasets. The controller was running in the cluster's pod network, which doesn't have a route to the tailnet.The fix needed two things at once:1. **`controller.hostNetwork: true`** in the `democratic-csi` Helm values. This puts the controller pod in the node's network namespace, so it can use `tailscale0` directly.2. **An `ExternalName` Service** annotated for the Tailscale operator, so the driver config doesn't need to bake in a raw tailnet IP:```yamlapiVersion: v1kind: Servicemetadata:name: truenas-tailscalenamespace: democratic-csiannotations:tailscale.com/tailnet-ip:"100.x.y.z"spec:type: ExternalNameexternalName: placeholder # operator overwrites thisports:-{name: https,port:444,protocol: TCP }``` The Tailscale operator notices the annotation, sets up an egress proxy for that tailnet IP, and the driver config points at `truenas-tailscale.democratic-csi.svc.cluster.local`.Strictly, hostNetwork on its own would have been enough. TheExternalName Service is there so that if the TrueNAS tailnet IPever changes, I update one annotation rather than re-encryptingthe SOPS-encrypted driver config.This took me embarrassingly long to debug, because the failuremode was "the controller pod is Running and Ready, but no PVs areever created and the events tab is silent." The clue is in`kubectl logs` on the controller pod — but only if you grepspecifically for "TrueNAS" or "freenas-api", since the actualerror is a connection timeout buried in a stack trace.## Stumbling block 4: Immich needed to leave homeSolving the family-sharing problem turned out to be the thingthat forced the Immich migration, not the other way round.Here's the chain of reasoning:- I'd just moved TrueNAS to `tag:truenas` (so the kubelet ACL grant could be tag-based, not user-based).- Tagged machines cannot be node-shared in Tailscale. (This is documented behaviour — sharing is at the user level; tags replace user ownership.)- That meant my family members lost access to Immich, because the TrueNAS host was no longer share-able.The cleanest fix was to give Immich *its own Tailnet device*. Todo that, it needed to leave the TrueNAS apps system (which onlyhas one tailscaled, owned by the host) and become a DockerCompose stack with a `tailscale/tailscale` sidecar in the sameCompose network. The sidecar would join the tailnet under`tag:family-immich`, run `tailscale serve` to terminate HTTPS atthe MagicDNS hostname, and proxy traffic to the `immich-server`container.The Compose looks roughly like this (full version in the repo):```yamlservices:immich-server:image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} # No `ports:` — access is Tailnet-only via the sidecar....immich-family-ts:image: tailscale/tailscale:latesthostname: immich-familyenvironment:-TS_AUTHKEY=${TS_AUTHKEY}-TS_STATE_DIR=/var/lib/tailscale-TS_SERVE_CONFIG=/config/serve.json-TS_USERSPACE=truevolumes:-/mnt/HDDs/immich/family-ts:/var/lib/tailscaleconfigs:-source: immich-family-ts-servetarget: /config/serve.jsonconfigs:immich-family-ts-serve: content: | { "TCP": { "443": { "HTTPS": true } }, "Web": { "$${TS_CERT_DOMAIN}:443": { "Handlers": { "/": { "Proxy": "http://immich-server:2283" } } } }, "AllowFunnel": { "$${TS_CERT_DOMAIN}:443": false } }```Two things to call out:- **`network_mode: service:` is a trap here.** The "obvious" way to set up a tailscale sidecar is to put the app in the sidecar's network namespace (`network_mode: service:immich-family-ts`). That works for single-container apps. Immich isn't one — it needs to resolve `database` and `redis` as service-name DNS, which doesn't work from inside the sidecar's namespace because the sidecar runs in user-space and doesn't have Docker's embedded DNS. The fix is to leave both the app and the sidecar on the Compose network and have the sidecar's `serve.json` reach the app by container name. I learned this by spending about an hour staring at "redis: name does not resolve" errors.- **Family members get shared *just this device*.** The Tailscale share UI lets me share `immich-family` with each family member's account. They see one device in their tailnet called`immich-family.<my-tailnet>.ts.net`, and that's it. No AdGuard, no Jellyfin, no TrueNAS UI.While I was at it I migrated the [ImmichFrame][iframe] slideshowto the same model — its own sidecar, its own Tailnet device,shared to the same family members. That uses `network_mode:service:` because ImmichFrame is a single container and doesn'tneed DNS.> ### The Postgres-18 booby trapThe migration came with one final surprise. The upstream ImmichCompose template mounts the database volume at`/var/lib/postgresql/data`. That's correct for Postgres 14,which is what their template targets.My inherited data, from the previous TrueNAS Immich app, was onPostgres 18. Postgres 18 changed the on-disk layout: the datadirectory is expected to be `/var/lib/postgresql`, withversion-numbered subdirectories underneath (e.g.`/var/lib/postgresql/18/`). Mounting the existing data at`/var/lib/postgresql/data` produced a "this looks like an emptydata directory" message and Postgres helpfully `initdb`-ed afresh one — which would have nuked my photos if I'd let it run.The fix is a one-line change in the Compose:```yamlvolumes:-${DB_DATA_LOCATION}:/var/lib/postgresql # NOT /data```…and a comment block above it that's about ten times longer thanthe line itself, because future-me will absolutely forget.The Docker Hub PR that documents the change is [here][pg-pr] ifyou want the upstream rationale. The TL;DR is "always check thedata-directory layout when bumping a major Postgres version,even if you didn't *think* you were bumping it."## What you have at the end of Part 3- A Tailscale tag model that's actually load-bearing, with each tag granted exactly what it needs.- A Kubernetes cluster whose kubelet talks to the LAN, and only the LAN, for node-to-node traffic.- A `democratic-csi` install that can both mount NFS volumes (kubelet) and talk to the TrueNAS API (controller) over the tailnet.- An Immich (and ImmichFrame) deployment that lives as its own Tailnet device, shareable to family without exposing the rest of the homelab.This took longer to write than I would like to admit. It alsotook longer to do than I would like to admit. The thing about ahomelab is that nobody is paying you for it, so the onlyincentive to actually finish the writeup is the suspicion thatfuture-you will need it. Future-me almost certainly will.Onward to [Part 4][part4], the actual reward.[index]: ./2026-05-23-homelab-k8s-journey.md[part1]: ./2026-05-23-from-truenas-to-kubernetes.md[part2]: ./2026-05-23-flux-and-sops.md[part4]: ./2026-05-23-grafana-finally.md[ts-share]: https://tailscale.com/kb/1084/sharing[tsop]: https://tailscale.com/kb/1236/kubernetes-operator[ext]: https://github.com/siderolabs/extensions/tree/main/network/tailscale[runbook]: ../talos-extensions-rollout.md[dcsi]: https://github.com/democratic-csi/democratic-csi[iframe]: https://github.com/immichframe/ImmichFrame[pg-pr]: https://github.com/docker-library/postgres/pull/1259