Reliability Engineer

451 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Chainlink Labs

San Francisco, CA, United States

$98k - $112k

Coinbase

Remote

$186k - $218k

Zscaler

Remote

$115k - $165k

Tenderly

Remote

Avalabs

Remote

$90k - $100k

CleanSpark

Las Vegas, NV, United States

$90k - $110k

Zamp

Bangalore, India

$77k - $85k

Stellar

Remote

$210k - $310k

Parity

Remote

$72k - $72k

Chainlink Labs

United States

$98k - $112k

Coinbase

Remote

$180k - $218k

Zscaler

Remote

$140k - $200k

Coinbase

Remote

$211k - $249k

Coinbase

Remote

$122k - $140k

Chainlink Labs
$98k - $112k estimated
San Francisco United States

About Us 

Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access to real-world data, offchain computation, and secure cross-chain interoperability across any blockchain. Chainlink Labs helps power verifiable applications for banking, DeFi, global trade, and gaming by collaborating with some of the world’s largest financial institutions, notably Swift, DTCC, and ANZ. Chainlink Labs also works with top Web3 teams, including Aave, Compound, GMX, Maker, and Synthetix. Chainlink Labs was ranked as one of the Global Top 100 Most Loved Workplaces by Newsweek 2025.

The Infrastructure Platform team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load. Key initiatives surrounding our mission include architecting and building a services catalog and an internal developer platform.

This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience building and growing an internal developer platform. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow.

We are distributed across time zones and continents, and we embrace remote work. Our team shares on-call responsibilities as part of a healthy rotation, with an emphasis on reasonable coverage and strong peer support.

We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise.

Your Impact

  • Build and orchestrate large, distributed infrastructure with a focus on automation

  • Shape the resilience, efficiency, and scalability of our services

  • Partner with engineers from across the company to help troubleshoot issues, deploy solutions, and accelerate their velocity

  • Innovate and enhance the infrastructure platform team’s product offerings to increase self service, improve cost optimization, and reduce toil

  • Provide technical leadership and mentoring to your team and others

  • Champion best practices in reliability, security, and cloud infrastructure to help cultivate a culture of high operational standards

Requirements

  • At least 8 years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before

  • Ability to develop software outside of the scope of typical infrastructure requirements and configurations

  • Experience in deploying or extending an internal developer platform to reduce cognitive burden and increase efficiency of other engineering teams

  • Have led large cross-team initiatives and can demonstrate a successful track record with quantifiable metrics that impact the business

  • Practical experience in shell scripting and demonstrable skills in at least one higher-level language

  • Excellent understanding of Linux

  • Expert knowledge in all aspects of designing, deploying, and supporting large real-time systems

  • Experience with monitoring, logging, alerting and end to end tracing is a plus

  • Experience with distributed systems and container orchestration.

  • Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews

  • Familiar with most tools from our stack (see below)

Desired Qualifications

  • Experience running any infrastructure in the blockchain/web3 space is a plus

  • Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity

  • Experience with internal, ephemeral test and development environments

  • Experience implementing self service tooling

  • Developed strategies for using Software Build of Materials (SBOM), artifact signing and verification

  • Passion for security

  • Experience with setting team priorities (OKRs) and aligning business processes required to get a product/service from ideation to production (PRD, RFC, etc)

  • Experience working remotely in a distributed team

  • A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil

  • Excitement for blockchain, Web 3.0, and similar decentralized technologies.

Our Stack

  • We adhere to the GitOps approach to infrastructure and state management. Self service and automation through our internal developer platform is paramount.

  • Some of the tools and services we use daily or almost daily are: AWS; Terraform/Terragrunt; Kubernetes, ArgoCD; GitHub Actions; Grafana

  • We expect you to be comfortable with many of these tools or have a strong understanding of the fundamental concepts the tools are applied to.

All roles with Chainlink Labs are global and remote-based. Unless otherwise stated, we ask that you try to overlap some working hours with Eastern Standard Time (EST).

We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes. The closing date is listed on the job advert, so we encourage you to take the time to thoughtfully prepare your application. We want to fully consider your experience and skills, and you will hear from us regarding the status of your application shortly after the closing date.

Commitment to Equal Opportunity

Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form.

Global Data Privacy Notice for Job Candidates and Applicants

Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit is subject to our Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.