Reliability Engineer

481 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Kraken

Remote

$88k - $101k

Copperco

Remote

$140k - $180k

Zscaler

Remote

$115k - $165k

Zscaler

Remote

$161k - $230k

Layerzerolabs

Remote

$86k - $110k

Zinnia

Remote

$126k - $127k

Gsrmarkets

Remote

$80k - $100k

Douro Labs

North America

$112k - $156k

Polygon Labs

LATAM

$84k - $100k

Zetachain

Remote

$157k - $171k

Parity

Remote

$80k - $120k

D3

Remote

$112k - $156k

Binance

Dublin, Ireland

Autumn Compass

Sydney, Australia

$120k - $150k

Chainlink Labs

United States

$98k - $112k

Kraken
$88k - $101k estimated
Remote
Apply

Building the Future of Crypto 

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here.

As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Desktop, Wallet, and Kraken Futures.

Become a Krakenite and build the future of crypto!

Proof of work

The Team

This is a fully remote role, we will consider applicants based in LATAM. Our Engineering team is having a blast while delivering the most sophisticated crypto-trading platform out there. Help us continue to define and lead the industry.
As part of Kraken's Core Infrastructure Team, you will work within a world-class team of engineers building and maintaining Kraken's infrastructure. As a Site Reliability Engineer, you will be keeping one of the fastest growing companies in the world up and available in a 24/7 environment. You will bring your own technical expertise to monitor and support staging and production environments, build tooling, CI/CD pipelines, deployment specifications and generally automate internal processes to empower developers and improve team efficiency.

The Opportunity

  • Own and operate Kraken’s Core Infrastructure Platform, supporting a highly resilient, low-latency, distributed exchange environment across on-prem and cloud.

  • Deliver, monitor, and support staging and production systems in a 24/7 environment.

  • Participate in an on-call rotation to ensure the reliability and availability of mission-critical systems.

  • Design, build, and maintain infrastructure tooling, CI/CD pipelines, and self-service platforms to improve developer velocity and autonomy.

  • Lead migrations across cloud and on-prem environments and drive adoption of Infrastructure as Code (Terraform).

  • Manage container orchestration platforms (Kubernetes, Nomad) and service deployments.

  • Implement infrastructure security best practices to ensure availability, integrity, and compliance.

  • Operate and optimize distributed systems (Kafka, Redis, Elasticsearch, MariaDB) and ingress infrastructure (proxies, CDNs), troubleshooting performance and reliability issues in collaboration with cross-functional teams.

Skills you should HODL

  • 5+ years in SRE, DevOps, or backend infrastructure engineering roles.

  • 3+ years programming in Rust, Golang, or Python.

  • Strong expertise in AWS and cloud infrastructure architecture.

  • Deep experience with Docker and orchestration platforms (Kubernetes or Nomad).

  • Hands-on experience with Infrastructure as Code (Terraform) and related tooling (e.g., Vault, Consul, Salt).

  • Strong monitoring and observability experience (Grafana, VictoriaMetrics, Splunk, AlertManager).

  • Solid Linux systems knowledge, networking fundamentals, and advanced debugging skills.

  • Experience implementing secure, scalable CI/CD pipelines and Git-based workflows.

  • Strong security mindset with experience applying best practices in complex environments.

  • Excellent problem-solving abilities and the ability to operate independently in a fully remote, distributed team environment.

Nice to haves

  • Strong background in distributed systems and advanced networking concepts.

  • Experience with Cloudflare (WAF, caching, Workers) and edge infrastructure.

Unless a specific application deadline is stated in the job posting, applications are accepted on an ongoing basis.

Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution.

We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

Kraken is powered by people from around the world and we celebrate all Krakenites for their diverse talents, backgrounds, contributions and unique perspectives. We hire strictly based on merit, meaning we seek out the candidates with the right abilities, knowledge, and skills considered the most suitable for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgable about crypto!

As an equal opportunity employer, we don’t tolerate discrimination or harassment of any kind. Whether that’s based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws. 

Stay in the know

Follow us on Twitter

Learn on the Kraken Blog

Connect on LinkedIn


Candidate Privacy Notice

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.