Reliability Engineer

301 jobs found

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Edge & Node

Remote

$112k - $156k

Gemini

Singapore, Singapore

$87k - $102k

Elwood Technologies

Remote

$95k - $114k

Talos

New York, NY, United States

$72k - $90k

Gemini

New York, NY, United States

$136k - $190k

Aurora Labs

Remote

$72k - $100k

Shakepay

Montreal, Canada

$145k - $180k

SwissBorg

Budapest, Hungary

$83k - $156k

SwissBorg

Remote

$115k - $131k

Metaco

Bangalore, India

$105k - $109k

Triton

Malaysia

$72k - $75k

Ramp Network

Poland

$90k - $100k

Ripple

Bangalore, India

$90k - $115k

Luno

Cape Town, South Africa

$112k - $156k

Chainlink Labs

Remote

Aragon Association

Remote

$90k - $100k

Pintu

Remote

$90k - $100k

Web 3.0 Technologies Foundation

Zug, Switzerland

$112k - $131k

Kraken Digital Asset Exchange

Remote

$63k - $88k

Chainlink Labs

Remote

Pintu

Remote

$76k - $81k

Swan

Remote

$105k - $113k

Ava Labs

New York, NY, United States

$175k - $218k

Ava Labs

New York, NY, United States

$151k - $189k

MoonPay

Remote

$90k - $100k

Edge & Node
$112k - $156k est.
Remote
Apply

Edge & Node stands as the revolutionary vanguard of web3, a vision of a world powered by individual autonomy, shared self-sovereignty and limitless collaboration. Established by trailblazers behind The Graph, we’re on a mission to make The Graph the internet’s unbreakable foundation of open data. Edge & Node invented and standardized subgraphs across the industry, solidifying The Graph as the definitive way to organize and access blockchain data. Utilizing a deep expertise in developing open-source software, tooling, and protocols, we empower builders and entrepreneurs to bring unstoppable applications to life with revolutionary digital infrastructure.

Edge & Node acts on a set of unwavering principles that guide our journey in shaping the future. We champion a decentralized internet—free from concentrated power—where collective consensus aligns what is accepted as truth, rather than authoritative dictation. Our commitment to censorship resistance reinforces our vision of an unyielding information age free from the grasp of a single entity. By building for open-source, we challenge the stagnant landscape of web2, recognizing that true innovation thrives in transparency and collaboration. We imagine a permissionless future where the shackles imposed by central gatekeepers are not only removed, but relegated to the dustbin of a bygone era. And at the foundation of it all, our trust shifts from malevolent middlemen to trustless systems, leveraging smart contracts to eliminate the age-old vulnerabilities of misplaced trust.

The Site Reliability team works closely with Engineering teams across Edge & Node to ensure the services we operate are reliable, performant, and predictable. We focus on a mix of software development, operational automation and collaboration with other teams to help take our service delivery to the next level.

We are looking for a highly motivated engineer with either SRE or DevOps experience that can help us develop and automate the various services E&N operates as part of the Graph ecosystem. In this role, you will have the opportunity to drive availability and reliability across multiple engineering teams and work closely with them to ensure the operational aspects of managing services is automated and observable.

What You’ll Be Doing

  • Building automation and management systems to deliver the various services which enable The Graph to function.

  • Coaching teams across the Graph ecosystem on best practices for deployment, observability and scalability

  • Collaborate with other SREs and engineering leaders to ensure our architecture and operations are world-class

  • Cultivate a culture of learning by providing insight into performance and reliability at an operational level

What We Expect

  • Experience building and delivering large-scale software systems

  • Experience operating as a SRE (or similar role) with hands-on experience implementing processes that drive reliability and performance

  • History of working across organizations to codify and implement best practices for both operation and construction of software systems; knowledge of CI/CD best practices and ability to implement are considered a plus.

  • Deep working knowledge of Kubernetes (or other container orchestration systems) and associated technologies

  • Clear communication skills (written and verbal) to document processes and architectures

About the Graph

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.