Reliability Engineer

499 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Kraken

London, United Kingdom

$119k - $131k

Kraken

London, United Kingdom

$105k - $120k

Limit Break

Tokyo, Japan

$112k - $130k

Hyperbolic Labs

San Francisco, CA, United States

$103k - $120k

Chainlink Labs

United States

$115k - $117k

Keyrock

Brussels, Belgium

$133k - $135k

Zora

Remote

$170k - $225k

asymmetric.re

Remote

$124k - $150k

Chainlink Labs

Argentina

$112k - $156k

Kraken

Remote

$88k - $101k

Zinnia

Remote

$126k - $127k

Gsrmarkets

Remote

$80k - $100k

Douro Labs

North America

$112k - $156k

Polygon Labs

LATAM

$84k - $100k

Zetachain

Remote

$157k - $171k

Kraken
$119k - $131k estimated
United Kingdom London United Kingdom

Building the Future of Crypto 

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here.

As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Desktop, Wallet, and Kraken Futures.

Become a Krakenite and build the future of crypto!

Proof of work

The team

This role is fully remote, with a strong preference for candidates in EU timezones. The Payward Services (PWS) business unit powers Kraken's B2B and institutional product suite, serving external partners and institutional clients under contractual SLAs.

As a Senior SRE, you will partner with PWS development and operations teams to manage infrastructure, improve CI/CD pipelines, and support operational excellence. You will bring expertise in infrastructure, monitoring, and automation to ensure performant, resilient, and continuously improving services.

The opportunity

  • Manage and support infrastructure for Payward Services, including Nomad, Kubernetes, databases, and 3rd party system integration

  • Provide operational support across multiple teams, helping debug issues in staging and production environments

  • Participate in incident response and post-incident reviews to improve system resilience

  • Consult with teams on performance, monitoring, and alerting best practices — with awareness of partner-facing SLA commitments

  • Build tooling, automation, and dashboards to improve observability and empower development teams

  • Maintain and troubleshoot CI pipelines, ensuring reliable and fast build, test, and deployment cycles

  • Collaborate with developers, QA, and product managers to streamline development and release cycles

  • Support a fully distributed team operating across multiple timezones

Skills you should HODL

  • 5+ years in DevOps or SRE role

  • Proficiency with hybrid-cloud infrastructure environments

  • Git source version-control and CI/CD configuration proficiency

  • Deep understanding of monitoring and alerting systems, preferably Prometheus and Grafana

  • Ability to debug complex distributed systems, networks, and Linux operating systems issues

  • Containerization and orchestration experience (Docker, Nomad, Kubernetes a plus)

  • Strong scripting skills (Bash, Python, or Go)

  • Self-starter capable of thriving independently and remotely in fast-paced environments

Nice to haves

  • Background working with distributed systems and technologies (Kafka, gRPC, Redis, etc.)

  • Experience operating services with external SLAs or in a B2B/enterprise context

  • Experience with benchmarking, performance tuning, and identifying system bottlenecks

  • Proficiency with databases (SQL and NoSQL) and production operations experience

  • Interest in lower-level programming languages such as Rust

  • Experience integrating with APIs (GitLab, Jira, Slack)

Unless a specific application deadline is stated in the job posting, applications are accepted on an ongoing basis.

Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution.

We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

Kraken is powered by people from around the world and we celebrate all Krakenites for their diverse talents, backgrounds, contributions and unique perspectives. We hire strictly based on merit, meaning we seek out the candidates with the right abilities, knowledge, and skills considered the most suitable for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgable about crypto!

We may ask candidates to complete job-related skills or work->

As an equal opportunity employer, we don’t tolerate discrimination or harassment of any kind. Whether that’s based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws. 

Stay in the know

Follow us on Twitter

Learn on the Kraken Blog

Connect on LinkedIn


Candidate Privacy Notice

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.