Reliability Engineer

468 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer

Director of Infrastructure Engineering

DFINITY

This job is closed

Are you passionate about leading and growing a team of top-notch software engineers and site reliability experts? Do you want to build and operate the infrastructure to quality, release and operate the world’s most advanced blockchain network? Building on your solid DevOps experience, we want you to reimagine DevOps practices for a decentralized world. You will collaborate with a motivated team that works on some of the most sophisticated, secure, and efficient distributed messaging protocols ever to be deployed. We’re a vibrant startup that has grown to 200+ employees with offices in the Bay Area and Zurich.

Responsibilities:

  • Strategy and roadmap: develop and execute on a roadmap that continuously improves the efficiency and quality of how we get software changes from our development teams on to mainnet. That includes the tooling used by our engineers to build and test code, the infrastructure used to test changes, and the procedures and pipelines to roll out changes. Moreover, evolve our operational setup including monitoring of essential metrics, definition of alerts and the procedures to deal with incidents.
  • Team building: you will be leading strong engineering managers with years of experience. Coach them to achieve their full potential with their teams and to collaborate seamlessly. Participate in the recruitment process to grow your teams.
  • Delivery: Agree on timelines and commitments of your teams, based on your solid technical understanding. Track progress, adjust to changing priorities and help to resolve ambiguities.
  • Setting quality standards: Build with your teams the tooling and processes that incentivize engineering teams to deliver well-tested quality software.

Requirements:

  • 8+ years of management experience and have led a team of teams.
  • Strong communicator, experienced in leading international and geographically distributed teams
  • Shipped and operated large-scale distributed systems with thousands of customers/users
    • Applied SRE best practices
    • Monitored the system and responsible for troubleshooting

  • Familiar with frameworks and best practices to build large-scale software systems
  • Track record in establishing a culture of software quality
  • Ability to reimagine DevOps in a decentralized setup
  • Strong interest in web3 and crypto

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.