Reliability Engineer

486 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Zetachain

Remote

$157k - $171k

Parity

Remote

$80k - $120k

D3

Remote

$112k - $156k

Binance

Dublin, Ireland

Autumn Compass

Sydney, Australia

$120k - $150k

Chainlink Labs

United States

$98k - $112k

Wormholefoundation

Remote

$112k - $156k

Kiln

Paris, France

$112k - $156k

Kiln

Paris, France

$84k - $112k

Zscaler

Remote

$140k - $200k

Zscaler

Remote

$130k - $131k

Zenith

Remote

Kraken

United States

$127k - $203k

Alpaca

Remote

$120k - $149k

InfStones

Texas

$36k - $54k

Zetachain
$157k - $171k estimated
Remote

About ZetaChain We're building something ambitious at ZetaChain: the first universal blockchain and AI platform that connects everything—Bitcoin, Ethereum, Solana, and more—while pioneering in the GenAI space. We're backed by top investors, live on mainnet, and building the future of blockchain and AI technology. If you're excited about working on big, meaningful problems with a world-class team, you're in the right place.About the Role We are looking for a Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and security of ZetaChain’s production infrastructure. This role is highly hands‑on and execution‑focused. You will operate critical blockchain and AI‑adjacent infrastructure, build automation to reduce operational overhead, and partner closely with protocol, platform, and AI teams to design systems that are reliable by default. What You'll Do

Operate and maintain production blockchain infrastructure, including validators, RPC services, indexers, and supporting services Ensure high availability and performance for AI‑enabled developer platforms and internal tooling Build and maintain monitoring, alerting, and dashboards for protocol, infrastructure, and application health Write high‑quality automation and infrastructure code to reduce toil and improve reliability Participate in on‑call rotations, incident response, and post‑incident reviews Partner with engineering teams to embed reliability, scalability, and security best practices into system design Improve Kubernetes reliability across cloud and bare‑metal environments Continuously refine deployment, rollback, and recovery strategies

Minimum Qualifications Our ideal candidate description is a wish list, not a checklist. We don’t expect every applicant to meet every requirement. Experience

4+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or Platform Engineering Strong software engineering background with production experience in Go and/or Python Deep experience operating Linux systems in production Proven experience running Kubernetes at scale Experience supporting high‑availability distributed systems Comfortable working in fast‑moving startup environments Strong security mindset, especially for infrastructure running on public or adversarial networks Excellent collaboration and communication skills

Technologies

Languages: Go, Python, Bash, Terraform, Ansible Infrastructure: Kubernetes, Docker, Linux Observability: Prometheus, Grafana, Datadog, Loki, incident.io Platforms: AWS, GCP, bare metal Blockchain Stack: Cosmos SDK, Tendermint / CometBFT, Ethereum, Bitcoin

Bonus Points

Exposure to AI‑powered infrastructure, observability, or developer tooling Experience operating blockchain nodes or validator infrastructure Familiarity with Cosmos‑based chains or EVM clients Experience with DevOps, DevSecOps, or GitOps methodologies Contributions to open‑source software

Why You’ll Love Working at ZetaChain

Make a direct impact on infrastructure powering both blockchain and AI platforms Work on technically challenging, real‑world distributed systems Fully remote with quarterly in‑person team meetups Strong open‑source culture and modern engineering practices Competitive compensation and meaningful ownership

Compensation Base Salary: $140,000 – $190,000This range reflects base salaries for roles in the San Francisco market. For candidates in other locations, compensation is adjusted to remain competitive within their local market. In addition to the base salary, all full-time team members receive an additional 10% to 25% in liquid benefits with upside based on role, experience, and impact. We believe in building together and sharing in the long-term success of the network. Compensation packages are designed to be competitive and aligned with the growth of both the team and the ecosystem.Let’s build the first Universal Blockchain together.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.