Job Position | Company | Posted | Location | Salary | Tags |
---|---|---|---|---|---|
Ava Labs | New York, NY, United States | $151k - $189k | |||
DApp360 Workforce | United States | $122k - $123k | |||
MDA Edge | Austin, TX, United States | $115k - $180k | |||
Calyptus | United States | $105k - $180k | |||
Learn job-ready web3 skills on your schedule with 1-on-1 support & get a job, or your money back. | | by Metana Bootcamp Info | |||
Gemini | New York, NY, United States | $152k - $213k | |||
OpenZeppelin | United States | $36k - $70k | |||
Osmosis | United States | $175k - $250k | |||
SharpHeads | Austin, TX, United States | $58k - $103k | |||
Triton | United States | $72k - $75k | |||
Triton | United States | $72k - $75k | |||
Burnt Finance | New York, NY, United States | $76k - $100k | |||
Paxos | New York, NY, United States | $105k - $120k | |||
Paxos | New York, NY, United States | $105k - $120k | |||
Paxos | New York, NY, United States | $105k - $120k | |||
Paxos | New York, NY, United States | $105k - $120k |
This job is closed
We're looking for a Senior Site Reliability Engineer to join our Infrastructure team. This Engineer will enable our developers as they work efficiently while building a vibrant ecosystem for the Avalanche Blockchain. You'll enable our teams across several business units and engineering teams to design, optimize, and and implement greenfield technology for a variety of use cases. This particular role will be a key part of our release schedule and production monitoring.
WHAT YOU WILL DO
- Develop and optimize highly reliable and scalable infrastructure focused on SRE principles.
- Implement and maintain monitoring, logging, and tracing tools to gain insights into service behavior and health.
- Uphold SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets for critical systems.
- Enhance the reliability and resiliency of critical systems by identifying single points of failure and implementing best practices.
- Collaborate with software developers to build reliability and performance into applications from inception.
- Automate and streamline incident management processes to minimize service disruption and improve response times.
- Participate in on-call rotations, ensuring quick restoration of services and fostering a blameless post-mortem culture.
- Foster a continuous improvement mindset by analyzing and learning from incidents and implementing preventive measures.
- Leverage cloud technologies and IaC tools to ensure scalability and repeatability.
- Advocate for best practices in reliability, security, and maintainability within the team.
WHAT YOU WILL BRING
- BS in Computer Science or related field.
- 5+ years of experience as an SRE, DevOps, or Cloud Engineer.
- Strong grasp of SRE principles, including error budgets, SLOs, and SLIs.
- Cloud networking and orchestration with AWS (EKS, ECS, VPC, S3, ELB).
- Strong Kubernetes experience with Docker or RKT containerization.
- Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, Terragrunt, and Ansible.
- Experience with monitoring and observability tools like Prometheus, Grafana, or ELK Stack.
- Building and maintaining CI/CD pipelines with GitHub Actions (preferred), Jenkins, Travis CI, Circle CI.
- Experience with automation and configuration management using Ansible, Puppet or Chef.
- Experience with Linux-based infrastructures. (Ubuntu preferred).
- Experience with scripting languages and the creation of scripts. (Python and GoLang preferred).
- Working knowledge of decentralized architecture design patterns and distributed systems.
Salary Range: $151,200 to $189,200
(**This is not a guarantee of compensation or salary, a final offer amount may vary based on factors including but not limited to experience and geographic location.)
#LI-Remote #LI-DS1