Reliability Engineer

458 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Stellar Development Foundation

New York, NY, United States

$165k - $205k

CoinDesk

New York, NY, United States

$135k - $195k

CoinDesk

Sao Paulo, Brazil

$90k - $100k

CoinDesk

London, United Kingdom

$90k - $100k

CoinDesk

Bangalore, India

$90k - $100k

Gemini

Gurgaon, India

$63k - $70k

NFT Now

Remote

$54k - $100k

Consensys

Remote

$72k - $100k

Consensys

Remote

$72k - $100k

Pinax

Canada

$54k - $100k

Binance

Taipei, Taiwan

Nethermind

London, United Kingdom

$90k - $100k

OKX

Singapore, Singapore

$105k - $120k

Coinmarketcap

Remote

$129k - $149k

OKX

Singapore, Singapore

$105k - $120k

Engineering Manager Site Reliability

Stellar Development Foundation
$165k - $205k

This job is closed

Interested in working on cutting-edge blockchain technology and creating equitable access to the global financial system? Since 2014, the mission-driven team at the Stellar Development Foundation (SDF) has helped fuel the tremendous growth of the Stellar blockchain network, an open-source platform that operates at high-scale today. Developers and companies around the world build on it, and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem.

You will lead an experienced Site Reliability Engineering team, ensuring our services and tooling are available, building infrastructure to make our team's production and testing environments available, and greasing the rails of our systems and processes to ensure they're robust, efficient, and easy to deploy.

SDF has a robust career path for both individual contributors and managers.

In this role, you will:

  • Establish a clear vision and mandate for the Site Reliability Engineering team
  • Define the SRE team's quarterly OKRs to best align with the company's goals
  • Define processes of collaboration between SREs and development teams throughout the software development lifecycle
  • Define a career growth path for the SRE team, as well as coach and mentor individual contributors on the team
  • Define and track metrics across engineering and help hold engineering teams accountable for their KPIs
  • Coordinate priorities with other teams and areas of the organization
  • Plan and execute sprints, track progress and oversee day-to-day tactical decisions
  • Design and build reliable systems, and infrastructure that is easy to use by software engineers
  • Monitor and troubleshoot systems in production
  • Define and participate in 24/7 on-call rotations alongside the team
  • Mediate technical discussions and review PRs
  • Jump in as needed with code fixes and hands-on contributions
  • Collaborate across the Stellar ecosystem, engaging with key partners and advising on their integration to set them up for success

You have:

  • 3+ years of experience working as a Site Reliability Engineer
  • 1+ year of experience managing or tech-leading an SRE team
  • Site Reliability Engineering experience:
    • Strong track record of collaborating with dev teams at all stages of product development (design, development/CI, beta testing, production)
    • Strong track record collaborating on defining, measuring and driving improvements in KPIs
    • Strong track record assisting teams during Root Cause Analysis and post mortems

  • Infrastructure and Operations experience:
    • Designing and building out the infrastructure for large distributed systems
    • Maintaining highly-available infrastructure
    • Troubleshooting and understanding complex technical problems
    • Using configuration Management tooling such as Terraform, Ansible, Puppet
    • Building and maintaining infrastructure using Kubernetes
  • Highly autonomous; able to find clarity in ambiguous circumstances
  • Excellent communicator; comfortable working with remote team members

Bonus Points if:

  • 3+ years of experience writing code in a major programming language
  • You have worked on an open source project
  • You have managed a distributed team
  • You build things for fun in your spare time

We offer competitive pay with a base salary range for this position of $165,000 - $205,000 depending on job-related knowledge, skills, experience, and location. In addition, we offer lumen-denominated grants along with the following perks and benefits:

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.