Users were registering on the auth system but couldn’t log in afterward. The login page showed a tenant mismatch error. The registration succeeded, but the identity was created without a required field: metadata_public.tenant.
This field was supposed to be set by a webhook. The webhook service wasn’t deployed yet.
The Architecture
The project uses an open-source identity server for user management. When a user registers, the identity server fires an after/registration webhook to the auth UI service. The webhook sets metadata_public to {"tenant": "default"} — a field that every downstream service checks to route the user to the correct organization.
User registers → Identity server creates identity → Webhook fires → Auth UI sets tenant
The Problem
The test environment rebuilds frequently. The identity server is part of the platform layer and starts before the application layer (where the auth UI lives). There’s a window — sometimes minutes, sometimes longer — where the identity server is accepting registrations but the webhook target doesn’t exist.
During that window:
- User registers
- Identity server creates identity with
metadata_public = null - Webhook fires, gets connection refused, gives up
- User tries to log in
- Application checks
metadata_public.tenant, finds null, rejects login
The user is stuck. The identity exists but is permanently broken unless someone manually patches the database.
The Fix
A Postgres trigger on the identities table that sets the tenant at the database level:
CREATE OR REPLACE FUNCTION auth.set_default_metadata()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.metadata_public IS NULL THEN
NEW.metadata_public = '{"tenant": "default"}'::jsonb;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER ensure_metadata
BEFORE INSERT ON auth.identities
FOR EACH ROW
EXECUTE FUNCTION auth.set_default_metadata();
The trigger fires on every INSERT, before the row is written. If metadata_public is null, it sets the default. If the webhook also fires and sets it, both write the same value — no conflict.
Why a Trigger and Not an Application Fix
I could have made the login flow tolerate null tenants, or added a retry queue for failed webhooks, or added an init container that waits for the auth UI before starting the identity server. Each of those adds complexity to the application layer and introduces new failure modes.
The trigger is:
- Zero-dependency — works regardless of what’s deployed
- Idempotent — setting the same value twice is a no-op
- Invisible — no application code changes needed
- Permanent — survives cluster rebuilds, service restarts, deployment order changes
Takeaway
When a webhook sets a required field, ask: what happens if the webhook target isn’t available? If the answer is “the record is permanently broken,” consider a database-level default as a safety net. Triggers aren’t a replacement for webhooks — they’re a fallback for the deployment window where the webhook target doesn’t exist yet.