Reliability Engineer

497 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Chainlink Labs

United States

$115k - $117k

Keyrock

Brussels, Belgium

$133k - $135k

Zora

Remote

$170k - $225k

asymmetric.re

Remote

$124k - $150k

Chainlink Labs

Argentina

$112k - $156k

Kraken

Remote

$88k - $101k

Layerzerolabs

Remote

$86k - $110k

Zinnia

Remote

$126k - $127k

Gsrmarkets

Remote

$80k - $100k

Douro Labs

North America

$112k - $156k

Polygon Labs

LATAM

$84k - $100k

Zetachain

Remote

$157k - $171k

Parity

Remote

$80k - $120k

D3

Remote

$112k - $156k

Binance

Dublin, Ireland

Chainlink Labs
$115k - $117k estimated
United States

About Chainlink
Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.

Many of the world’s largest financial services institutions have also adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic Chainlink Reserve. Learn more at chain.link.

The Engineering Team

As adoption of the Chainlink Runtime Environment (CRE) accelerates, you will be a part of that growth to ensure reliability and security remain at the forefront of development. This role assists in building the Kubernetes-based infrastructure primitives and application control-plane components that CRE runs on, enabling deterministic horizontally scalable deployments with decentralized consensus built in. By codifying scaling logic into reusable operators and automation, you will ensure CRE continues to scale safely, predictably, and efficiently as demand grows. The impact is foundational; the infrastructure you will help build is not constrained by product expansion, operational risk is reduced at scale, and it will help Chainlink strengthen its ability to grow its decentralized network without sacrificing reliability or security.

Your Impact:

You will design and build the infrastructure primitives that define how Chainlink Decentralized Oracle Networks (DONs) scale across internal systems and the decentralized ecosystem.

You will help create the CRE (Kubernetes-based) control plane that enables:

  • Deterministic horizontal scaling of DONs

  • Safe and repeatable infrastructure expansion

  • Improved operational efficiency and scalability

You will develop the core infrastructure components, including Kubernetes Operators and scaling automation, that Product teams will adopt and then might later be distributed to external node operators to improve decentralized scaling.

This is not an operational support role. You will be building the systems that define how Chainlink scales while shaping the reliability, scalability, and decentralization of protocol-level services.

Requirements:

  • 6–9+ years in SRE / Platform / Infrastructure Engineering

  • Proven experience scaling Kubernetes in high-throughput production environments

  • Deep knowledge of:

    • Scheduler behavior

    • StatefulSets & persistent workloads

    • Autoscaling strategies (HPA, VPA, KEDA, custom scaling)

    • Resource management & performance tuning

    • Multi-cluster and multi-region architectures

  • Experience in diagnosing production failures at the cluster scale

  • Strong Terraform or Crossplane experience

  • GitOps workflows (ArgoCD / Flux) experience

  • CI/CD reliability experience

  • Automation-first mindset

  • AWS production experience

  • Proficiency in Go (strongly preferred) or equivalent systems language

Desired Qualifications:

  • Experience with web3 concepts (e.g. blockchain node lifecycle, forks, reorgs, or RPC issues)

  • Experience with oracle systems, token architectures, or decentralized services

  • Experience scaling stateful high-availability distributed systems

  • Experience building internal platform primitives

  • Experience implementing custom autoscaling logic

  • Experience designing SLO strategies and error-budget usage

  • Experience improving diagnosability and observability frameworks

  • Experience working in high-ambiguity environments

  • Experience operating blockchain infrastructure in production

  • Certified Kubernetes Administrator (CKA)

  • Experience contributing to Kubernetes ecosystem projects

  • Experience building multi-tenant platform infrastructure

  • Experience working in high-security and/or SOC 2/ISO27001 compliant environments

  • Experience with chaos engineering practices or implementation

All roles with Chainlink Labs are global and remote-based. Unless otherwise stated, we ask that you try to overlap some working hours with Eastern Standard Time (EST).

We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes. The closing date is listed on the job advert, so we encourage you to take the time to thoughtfully prepare your application. We want to fully consider your experience and skills, and you will hear from us regarding the status of your application shortly after the closing date.

Commitment to Equal Opportunity

Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form.

Global Data Privacy Notice for Job Candidates and Applicants

Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit, is subject to our Recruiting Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.