Gauntlet is hiring a Web3 Infrastructure Engineer
Compensation: $150k - $175k
Location: New York
Infrastructure Engineer
You will help build the infrastructure behind one of the largest asset managers in onchain finance. Gauntlet serves $1.5B+ in client TVL, and the platform that ships and runs it is small, modern, and yours to shape. This is the team's second infrastructure hire, you'll work directly with our infra/platform lead, own real surface area from week one, and have a genuine say in the tools, patterns, and direction we take. If you want hands-on ownership of a cloud-native platform rather than a narrow slice of someone else's, read on.
About Gauntlet
Gauntlet builds the financial systems of the future. While much of onchain finance is focused on point solutions, we operate across the entire stack to offer best-in-class vault products. Today we serve over $1.5B in client TVL across some of the largest fintechs/neobanks, protocols, exchanges, and capital allocators in crypto — and, increasingly, traditional asset management. Our team brings together traditional finance and crypto-native expertise to deliver durable, sophisticated products for institutional clients moving onchain.
The role
Infrastructure & Security keeps Gauntlet's services shipping safely and reliably. Today it's effectively one engineer carrying both large, org-spanning initiatives (platform build-out, SOC 2, deployment security) and the steady stream of day-to-day requests from product teams. You'll take real ownership of that workload across our GCP, Kubernetes, and Terraform stack: unblocking application teams, hardening CI/CD, and driving infrastructure projects end-to-end so the platform can scale with the company. You'll partner closely with the application teams (Aera, Vault Curation) and with Security.
What you'll do;
- Support the application teams: turn around infra requests (permissions, roles, service setup, project peering) so product engineers stay focused on shipping.
- Own CI/CD and deployments: maintain and extend our GitHub Actions workflows and help migrate toward a dedicated CD tool with proper permissioning — the goal is fully automated, locked-down deploys via service accounts, no direct engineer access to production.
- Build and maintain infrastructure as code: author and update Terraform modules for new and existing services across GCP environments.
- Run Kubernetes the right way: manage service deployments via Helm (we're on Helm 4) keep async workloads healthy on Dagster.
- Unify observability (likely first project): consolidate today's per-team alerting into a single view — system-to-system dashboards plus incident alerting that routes upstream service/vendor failures to the right impacted teams and on-call rotations.
- Advance resilience: help move us toward a fully region- and cloud-agnostic posture so services can pick up and move if something fails.
- Strengthen security & access: apply IAM, secrets management, least privilege, and auditability; contribute to SOC 2 readiness.
- Automate with AI: build agent skills /
agents.mdso routine tasks (provisioning access, simple changes) can be handled by an agent instead of human engineering hours, and use AI to reason through bigger problems.
What Success looks like;
First 30 days. Ramp on the stack (GCP, Kubernetes/Helm, Terraform, GitHub Actions, Dagster). Meet the application and security stakeholders, and start reliably handling application-team requests.
First 90 days. Operating independently on the reactive workload and proactively creating/updating/managing infrastructure across GCP environments. On-call onboarding complete (Roby shadows then reverse-shadows your first shifts).
In 1 year. Delivered concrete platform improvements — new Terraform modules meeting app-team needs, upstream dependency upgrades, and a unified alerting/observability framework wired into incident reporting and on-call. Trusted to take significant infra projects off the lead's plate.
What you bring;
- Strong software-engineering fundamentals in at least one production language (Python, Go, TypeScript, or Rust); Python especially valued, plus comfort scripting and working in the shell.
- Hands-on experience with cloud infrastructure and core cloud services, especially GCP (AWS/Azure transferable).
- Experience operating large-scale Kubernetes production systems.
- Experience with Infrastructure as Code, especially Terraform.
- Familiarity with CI/CD systems, especially GitHub Actions or Octopus Deploy.
- Ability to debug production issues using logs, metrics, traces, shell tools, and source code.
- Security and access-control fundamentals: IAM, secrets management, least privilege, and auditability.
- Clear written communication around incidents, design decisions, and operational procedures.
Bonus points
- Supporting SOC 2 controls - evidence collection, access reviews, change management, or audit readiness.
- Observability with Datadog, Prometheus, Grafana, OpenTelemetry, Honeycomb, or similar.
- Improving developer experience through internal tooling, templates, scripts, or platform APIs.
- Incident response experience, including postmortems and follow-up remediation.
- Experience with Dagster, Helm 3+, high-scale CD tooling (Bazel, Octopus), or AI/agent-assisted ops.
- Basic web3 / DeFi literacy (transactions, wallets) and genuine curiosity about onchain — the role doesn't touch chain directly, but the business is onchain.
Apply Now:
Compensation: $150k - $175k
Benefits: Async
Receive similar jobs: