I’m a Platform Engineer who designs, builds, and operates production infrastructure — from zero-credential Kubernetes clusters and GitOps pipelines to compliance engines and internal developer platforms.
6+ years of experience across bare-metal and cloud, with a focus on systems that don’t need babysitting.
Infrastructure & Platforms Kubernetes (GKE, bare-metal), Docker, Helm, Terraform, GCP, Hetzner, OVH, KubeOne, Kubermatic, Gardener, Traefik, Linkerd, OpenEBS, cert-manager
GitOps & Secrets FluxCD (7-layer dependency chains, image automation, postBuild), GCP Workload Identity, 1Password, SOPS, GCP Secret Manager
CI/CD & Automation GitHub Actions, composite actions, self-hosted ARC runners, Artifact Registry layer caching, Trivy, SonarQube, Defectdojo, Renovate, Playwright, Nix/devenv
Observability & Reliability Prometheus, Grafana, Grafana Mimir, Loki, Promtail, Alertmanager, BetterStack, pg-exporter, PromQL
Auth & Identity Ory (Kratos, Keto, Hydra, Oathkeeper), GCP Workload Identity, cert-manager, x509 lifecycle management
Security & Compliance Supply chain security (Trivy vendoring, Defectdojo, SonarQube, Renovate Bot), SOC1/SOC2/SOX compliance automation, kubeconfig rotation, certificate lifecycle management
Databases PostgreSQL (Zalando Operator, Patroni, Sqitch migrations), Redis, ScyllaDB, Temporal, ClickHouse
Backend & Tooling Golang, Python, Bash, Node.js, TypeScript, Linux, FastAPI, Cobra CLIs
Seattle, US (Remote) · Jun 2023 – Apr 2026
Sole platform engineer on-call for 24/7 production reliability of 20+ blockchain nodes across 8+ Kubernetes clusters — built the entire infrastructure from scratch, owned all incidents end-to-end.
Platform Architecture
Bootstrapped 8+ Kubernetes clusters from scratch across Hetzner, OVH, and GCP — 2 production bare-metal clusters (mainnet daemons: BTC, Litecoin, Zcash, Sia, RSKJ, Trumpcoin; testnet mirrors of each) and 2 GKE app clusters (staging + prod mining pool), all using Traefik ingress and Linkerd service mesh. Also ran Luxorlabs, a dedicated experimental cluster for ~1 year: joined coin communities, read protocol docs, and worked directly with upstream devs to learn and validate new protocol nodes (Stacks, Babylon, Ethereum, tBTC, Threshold) before promoting them to production. Plus ARC runner cluster, BVT pre-production validation cluster, and a full staging cluster migration end-to-end.
Bare-metal clusters (Nebula, Proto-Nebula) ordered from both Hetzner and OVH across multiple geographic regions — dual-provider design for HA (if one provider has an incident, the cluster survives). Bootstrapped control planes with KubeOne, added nodes one by one, prepared local storage by sanitising drives, grouping multiple disks into LVM volumes, and configuring OpenEBS local PV backed by TBs of Hetzner/OVH external storage.
All clusters operate on a consistent 7-layer FluxCD GitOps pipeline, each layer gated by health checks before the next reconciles: Core (Helm repos + namespaces) → Infra (HelmReleases, CRDs, controllers) → Secrets (ESO/SOPS pull into namespaces) → Datastore (Ory stack: Kratos/Keto/Hydra/Oathkeeper + auth DB schema) → Platform (Traefik, Linkerd, tooling) → Monitoring (Prometheus, Loki, Grafana ServiceMonitors → centralised Grafana Mimir) → Apps (Luxor services). Every cluster follows this same pattern — reproducible and consistent.
Designed a zero-credential Kubernetes architecture — GCP Workload Identity + External Secrets Operator + 1Password, eliminating all static secrets from git and cluster state. On the final BVT cluster, went further: replaced SOPS entirely with GCP Workload Identity + service accounts + IAM, achieving zero hardcoded secrets or encryption keys in git.
Managed 897 database migrations across 5 databases on fresh Postgres — resolved cross-database ordering dependencies for reproducible environment bootstrapping.
Provisioned GKE clusters and static IPs via Terraform/OpenTofu (fluxor-iac repo); wrote custom Python scripts to manage Hetzner and OVH firewall rules via their APIs — no good Terraform modules existed for bare-metal providers at the time.
Reliability & Incident Response
Litecoin mainnet hard fork introduced an active inflation bug threatening chain split — led emergency incident response, coordinated cross-team build under time pressure, verified fix on staging, executed zero-downtime rolling deployment across all production nodes before the deadline, and authored the postmortem and updated incident runbook.
Live kubelet TLS certificate expiry hit bare-metal clusters in production — diagnosed root cause under pressure, restored service with zero downtime, implemented CSR auto-approval, and documented RCA for the engineering team. Then deployed Grafana’s x509 certificate-monitoring extension via Helm across 10 clusters, writing PromQL rules that fire P2 BetterStack alerts 7 days before expiry and P1 alerts 24 hours before — all clusters now actively monitored.
New product launch had no Postgres observability — deployed pg-exporter with 6 custom PromQL alert rules covering CPU throttle, pod restarts, connection saturation, and long-running queries, routing P2 alerts to Discord and on-call P1/P2 to BetterStack; built shared Grafana dashboards used daily by engineering and ops teams, collaborating with backend teams to calibrate thresholds against real traffic patterns.
Migrated entire on-call stack from Opsgenie to BetterStack — rebuilt on-call rotations, escalation chains, status pages, and Grafana Mimir chart integrations from scratch; developed Go heartbeat service (Cobra) for continuous endpoint health reporting.
Proactively authored technical runbooks for recurring incident classes (node crashes, storage failures, auth failures, certificate expiries); reviewed and iterated with engineering teams to keep documentation current and actionable.
CI/CD & Cost
After migrating all repos from GitLab to GitHub, owned CI/CD for the entire org as the sole workflows engineer for 2 years across ~50 repos — built build/test/push pipelines to GCP Artifact Registry picked up by FluxCD for auto-deploy, code-signed Windows and Mac release pipelines for customer distribution, and self-hosted ARC runner infrastructure including bare-metal ARC runners for Tenki (a new Luxor product whose CI I helped design from scratch).
Playwright E2E test suite required a devenv + Cachix + Nix environment combination that was broken on Linux — the team ran it on Mac hosted runners at ~40 min/run, costing $5K/month. Spent ~1 month debugging the toolchain gap, isolated the missing pieces, and ported the workflow to Linux ARC runners — cut run time to 40 seconds and eliminated the Mac runner cost entirely.
Security & Compliance
After aquasecurity/trivy-action was compromised, built a centralised Trivy composite action with a vendored binary — migrated 9+ repos and eliminated 190+ malicious npm packages. Vendoring proved critical: when CVE-2026-33634 (Mar 2026) hit, attackers force-pushed malicious code to 76 trivy-action tags to scrape runner memory for Kubernetes and cloud credentials across 12,000+ repos. Luxor’s vendored binary couldn’t be swapped, and our zero-credential CI had nothing to steal.
Deployed Defectdojo as org-wide security hub: daily reusable GitHub Actions workflow aggregates Trivy/SonarQube reports from 30+ repos and auto-raises Linear tickets for critical CVEs via API. Deployed Renovate Bot across 20+ repos — org stayed unaffected by React2Shell (CVE-2025-55182), Axios npm compromise (Mar 2026), and Shai-Hulud worm due to continuous automated package updates.
Built Memphis solo in 4 weeks ahead of Luxor’s planned IPO: Python compliance binary connecting GitHub, Bytebase, and GCP (API/service accounts) to automate SOC1/SOC2/SOX evidence collection — quarterly, half-yearly, and annual audit reports deposited to Google Drive with auditor-ready execution logs and automated Linear escalation for critical findings.
Built KubeOrbit solo: Go binary for self-service kubeconfig provisioning across 20+ clusters — built-in role templates, revocation, and SQLite audit trail (70–90% test coverage). Replaced manual per-request provisioning; adopted org-wide and in production for 1+ year.
Hyderabad, India · Oct 2020 – Jun 2023
Full-Stack Instructor — Winter terms (2022, 2023, 2024, 2025) Teaching final-year BSc Computer Science students: Next.js, FastAPI, core DevOps, and deployment fundamentals. Real projects, production patterns, no toy demos.
Tutor — 2018-2020 Taught 10th standard students (3 batches) before starting my engineering career.
A self-hostable alternative to Supabase — one platform for managed Postgres, authentication, authorization, OAuth2, API gateway, object storage, and feature flags. Built on battle-tested open-source components (Ory Kratos, Keto, Hydra, Oathkeeper, MinIO, flagd) with a unified React dashboard.
Stack: React 19, Express 5, Node.js 22, PostgreSQL 16, Docker, Cloudflare Pages
I take on short engagements with startups and small teams:
If you need someone who understands both systems and business constraints, feel free to reach out.
© 2026 Asutosh Panda Built slowly. Maintained carefully. No noise.