Reliability Engineer

233 jobs found

ai analyst backend bitcoin blockchain community manager crypto cryptography cto customer support dao data science defi design developer relations devops discord economy designer entry level erc erc 20 evm front end full stack gaming ganache golang hardhat intern java javascript layer 2 marketing mobile moderator nft node non tech open source openzeppelin pay in crypto product manager project manager react refi research ruby rust sales smart contract solana solidity truffle web3 py web3js zero knowledge

Job Position and Company	Posted	Location	Salary
Site Reliability Engineer Technical Lead Nethermind 📍 Remote	1y	$112k - $156k	engineer lead reliability +7
Lead Cloud Site Reliability Engineer Fmr 📍 Bangalore, India	1y	$105k - $120k	cloud engineer lead +5
Engineering Manager Platm Infrastructure Reliability Coinbase 📍 Remote	2y	$211k - $249k	executive engineer infrastructure +4
Site Reliability Engineer Bucharest Romania Fulltime Alchemy 📍 Bucharest, Romania	2y	$80k - $85k	engineer full time reliability +8
Get hired in web3 - JOB GUARANTEED Learn Job-ready Solidity & Rust skills, in your schedule with 1-on-1 mentor support, or get your money back. ISO 9001 Certified \| 400+ students	Learn more	by Metana
Sr Site Reliability Engineer Latam Bitso 📍 Latin America	2y	$112k - $156k	engineer reliability aws +4
Sr Site Reliability Engineer Europe Bitso 📍 European Economic Area	2y	$112k - $156k	engineer reliability aws +4
Site Reliability Engineer Data AI Kraken 📍 United States	2y	$92k - $101k	ai engineer reliability +5
Site Reliability Engineer Asymmetric Research 📍 Remote	2y	$105k - $180k	engineer reliability blockchain +8
Senior Site Reliability Engineer Limit Break 📍 Tokyo, Japan	2y	$90k - $145k	engineer reliability senior +5
Site Reliability Engineer Asymmetric Research 📍 Remote	2y	$105k - $180k	engineer reliability blockchain +7
Senior Site Reliability Engineer Onchain Gemini 📍 Remote	2y	$136k - $170k	engineer reliability senior +9
Site Reliability Engineer II Solutions Kraken 📍 United States	2y	$63k - $87k	engineer reliability aws +8
DevOps Site Reliability Engineer NigeriaRemote Token Metrics 📍 Manila, Philippines	2y	$73k - $95k	remote devops engineer +3
Site Reliability Engineer DevOps Syndr 📍 Delhi, India	2y	$98k - $114k	devops engineer reliability +6
Site Reliability Engineer Trading Technologies Kraken 📍 United States	2y	$92k - $101k	engineer reliability blockchain +4

Site Reliability Engineer Technical Lead

Nethermind

$112k - $156k estimated

Remote, Europe

Apply

What are we all about? We are a team of builders and researchers on a mission to empower enterprises and developers worldwide to access and build on decentralized systems. Our expertise covers several domains: Ethereum and Starknet protocol engineering, layer-2, cryptography research, protocol research, decentralized finance (DeFi), security auditing, formal verification, real-time monitoring, smart contract development, and dapps and enterprise engineering. Working to solve some of the most challenging problems in the blockchain space, we frequently collaborate with renowned companies, such as Ethereum Foundation, Starknet Foundation, Gnosis Chain, Flashbots, Forta Protocol, Lido, EigenLayer, Open Zeppelin, RISCZero, Aleph Zero, and many more. Today, we are a 350+ strong team working remotely across 66+ countries. View all our open positions here: https://www.nethermind.io/open-roles Are you the one? We're seeking an experienced Site Reliability Engineer to lead and mentor our SRE team. You're a seasoned professional with a proven track record in designing and implementing robust SRE processes at scale. You excel in cloud and hybrid environments, have a deep understanding of containerization, and are passionate about creating resilient, high-performance systems that can handle extreme traffic peaks. Beyond technical expertise, you're a skilled communicator and collaborator, able to bridge the gap between technical teams and stakeholders. You thrive in cross-functional environments and can effectively represent SRE concerns at the leadership level. Responsibilities:

Lead the implementation and refinement of SRE practices across the organization, including SLOs, error budgets, and blameless postmortems Design and implement automation to eliminate toil and improve system reliability and efficiency Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features Drive SRE practices to ensure high availability, performance, and reliability under varying load conditions Collaborate closely with Platform engineering team to build reliability into services from the ground up Collaborate closely with Nethermind’s Infrastructure Leadership department to align SRE strategies with overall technical vision Drive the adoption of observability best practices and implement comprehensive monitoring systems Develop and maintain service level indicators (SLIs) and objectives (SLOs), working with product owners to define appropriate reliability targets Mentor team members in SRE practices and foster a culture of continuous learning Lead capacity planning efforts, using quantitative analysis to predict and address future scaling challenges Contribute to long-term technical roadmaps, balancing reliability concerns with product innovation

Skills:

5+ years of experience in Site Reliability Engineering or DevOps Expert knowledge of cloud platforms (AWS, GCP) Expert knowledge of Kubernetes Proven experience in designing and implementing scalable, efficient, resilient systems Deep understanding of Linux/Unix systems and networking protocols Strong programming skills in Python or Go Strong background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki) Expertise in CI/CD tools (e.g. GitHub Actions, ArgoCD) Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to various audiences Experience in producing technical documentation, runbooks, presentations, and post-mortem reports Experience and passion for mentoring and upskilling team members

Nice to have:

Experience leading technical teams Contributions to open-source projects or thought leadership in SRE Familiarity with MLOps and big data technologies Knowledge of blockchain technology and infrastructure Experience with chaos engineering principles and tools Familiarity with traffic management and CDN technologies Systems or backend engineering background

Disclaimer: I hereby consent to my personal information being stored and processed by Demerzel Solutions Limited (t/a Nethermind) (the “Company”) for recruitment purposes in relation to both the selected job role and any other role the Company considers me a qualified candidate for. All data storing and processing by the Company takes place in accordance with the UK GDPR. Kindly refer to our privacy policy for more details. Your consent to share personal information is entirely voluntary, and you may withdraw your consent at any time. Should you have any questions about this process, or wish to withdraw your consent please contact: [email protected] Keep up to date on what we are working on by following us on our social channels Click here to view our Privacy Policy.

⬇

Apply Now

Join talent pool

What does Reliability Engineer do?

▼

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.

Stop applying — get discovered by hiring agents.

Site Reliability Engineer Technical Lead

Nethermind

Lead Cloud Site Reliability Engineer

Fmr

Engineering Manager Platm Infrastructure Reliability

Coinbase

Site Reliability Engineer Bucharest Romania Fulltime

Alchemy

Sr Site Reliability Engineer Latam

Bitso

Sr Site Reliability Engineer Europe

Bitso

Site Reliability Engineer Data AI

Kraken

Site Reliability Engineer

Asymmetric Research

Senior Site Reliability Engineer

Limit Break

Site Reliability Engineer

Asymmetric Research

Senior Site Reliability Engineer Onchain

Gemini

Site Reliability Engineer II Solutions

Kraken

DevOps Site Reliability Engineer NigeriaRemote

Token Metrics

Site Reliability Engineer DevOps

Syndr

Site Reliability Engineer Trading Technologies

Kraken

What does Reliability Engineer do?