A developer couldn’t access the staging database through a GCP IAP tunnel. The error: Error while connecting [4003: 'failed to connect to backend']. I assumed it was a missing IAM binding and created a Terraform PR to fix it.
The PR wasn’t the fix. The database pods had just been recreated and were temporarily unhealthy.
The Setup
GCP Identity-Aware Proxy (IAP) lets you tunnel into private resources without a VPN. For database access, the flow is:
Developer laptop → gcloud compute start-iap-tunnel → IAP → Backend Service → GKE Pod
The developer ran the documented command:
gcloud compute start-iap-tunnel <instance> 5432 --local-host-port=localhost:5432 --zone=us-central1-a
Error: 4003: failed to connect to backend.
The Misdiagnosis
Error 4003 looks like a permission error. The IAP documentation isn’t clear about what it means. My first thought: the developer’s Google account isn’t in the IAP-secured tunnel user binding.
I checked the Terraform config — the engineering group was listed but maybe the binding was stale. Created a PR to explicitly add the IAP role. Seemed like the right fix.
What Actually Happened
Before the PR was merged, the developer tried again and it worked. Nothing changed on the IAM side.
The real cause: the database pods had been recreated 16 hours earlier during a cluster migration. The backend service that IAP routes to had temporarily unhealthy endpoints. Error 4003 means “IAP authenticated you successfully, but the backend is unreachable.” It’s a connectivity error, not an authorization error.
The Error Code Breakdown
| Code | Meaning |
|---|---|
| 4001 | IAP is not enabled |
| 4003 | Backend unreachable (not permissions) |
| 4010 | Authentication failed |
| 4033 | Authorization failed (missing IAM role) |
4003 and 4033 look similar but mean very different things. 4033 is the permission error. 4003 is “I let you through, but there’s nothing on the other side.”
Takeaway
If you get GCP IAP error 4003, check the backend health before changing IAM bindings. 4003 means the tunnel authenticated you but couldn’t reach the target — the backend pod might be restarting, the service might be down, or the port might be wrong. The permission error is 4033, not 4003. Don’t fix the wrong layer.