Designing a Flux Dependency Chain That Doesn’t Fight Itself

Apr 5, 2026

Flux Kustomizations have dependsOn. Sounds simple — A depends on B, B deploys first. But when you have operators that install CRDs, secrets that need operators running, and services that need secrets, the ordering gets tricky fast.

Here’s the dependency chain I landed on after several failed attempts.


The Problem

A typical Kubernetes cluster needs these things in this order:

  1. Namespaces and Helm repos — everything else references them
  2. Operators — install CRDs (cert-manager, ESO, Postgres operator)
  3. Secrets — pulled from external stores (needs ESO operator running)
  4. Platform services — databases, caches, auth (need CRDs + secrets)
  5. Applications — the actual workloads (need everything above)

If you put operators and their CRD-dependent resources in the same Flux Kustomization, you get this on first deploy:

ClusterIssuer dry-run failed: no matches for kind "ClusterIssuer"

Flux does a server-side dry-run of all resources before applying. If the CRD doesn’t exist yet (because the operator hasn’t installed it), the dry-run fails and the entire Kustomization is rejected — including the operator that would have created the CRD.

Failed Attempt: Everything in One Layer

# platform kustomization
resources:
  - cert-manager-helmrelease.yaml    # installs CRDs
  - clusterissuer.yaml               # needs CRDs
  - wildcard-certificate.yaml        # needs CRDs

Result: cert-manager HelmRelease never installs because ClusterIssuer dry-run fails first.

Failed Attempt: CRD Resources in Core

Moving CRD-dependent resources to a later layer, but keeping the operator in the same layer:

core     → namespaces, helm repos, ESO operator
secrets  → ExternalSecrets (needs ESO CRDs from core)

This works if the ESO operator installs fast enough. But Flux retries with error logs until CRDs appear. Messy.

The Solution: Operators in Infra, Everything Else After

core       (no postBuild)
   namespaces, helm repos, cluster-vars

infra      (dependsOn: core)
   ALL operators: ESO, cert-manager, Traefik, Postgres operator
   Only HelmReleases  nothing that uses their CRDs

secrets    (dependsOn: infra)
   1Password Connect bootstrap, ClusterSecretStore, ExternalSecrets
   By now, ESO CRDs exist

platform   (dependsOn: secrets)
   ClusterIssuers, certificates (cert-manager CRDs exist)
   Postgres instances (Zalando CRDs exist)
   Services that need secrets (secrets exist)

Each layer only contains resources whose CRDs were installed by the previous layer.

Handling Scale: healthChecks

With many ExternalSecrets (50+), wait: true on the secrets layer causes API server throttling — Flux polls every resource for readiness. The fix:

# secrets kustomization
spec:
  wait: false
  healthChecks:
    - apiVersion: external-secrets.io/v1
      kind: ClusterSecretStore
      name: external-secrets-clusterstore

Flux only checks that the ClusterSecretStore is healthy (proves the entire secrets pipeline works), not every individual ExternalSecret. Downstream layers proceed once the store is valid.

Handling postBuild Conflicts

If you use postBuild.substituteFrom for domain variables, it processes ALL ${...} patterns in all resources — including shell scripts and SQL in ConfigMaps. The $$ in PL/pgSQL becomes $. The ${DB_USERNAME} in shell scripts becomes empty.

Per-resource opt-out:

apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    kustomize.toolkit.fluxcd.io/substitute: disabled

Only this resource skips substitution. Everything else still gets variables replaced.

The Final Chain

core       → foundation (no CRD dependencies, no postBuild)
infra      → operators only (installs all CRDs)
secrets    → external secrets (CRDs exist, pulls from secret stores)
platform   → databases, auth, certs, services (CRDs + secrets exist)

Four Flux Kustomizations. Clear dependencies. No CRD race conditions. No retry noise. Each layer can be reasoned about independently.

Takeaway

Separate CRD-installing HelmReleases from CRD-consuming resources. Put all operators in one layer, everything that uses their CRDs in the next. Use healthChecks instead of wait: true when a layer has many resources. Use the substitute: disabled annotation when ConfigMaps contain code with $ syntax.