Reliability Engineer

485 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Kraken

United States

$92k - $101k

Kraken

European Union

$36k - $54k

Circle - Referrals

Remote

$157k - $175k

Token Metrics

Manila, Philippines

$73k - $95k

Token Metrics

Lisbon, Portugal

$73k - $95k

Token Metrics

Cape Town, South Africa

$73k - $95k

Gemini

Remote

$172k - $215k

Stellar

New York, NY, United States

$150k - $200k

Uniswaplabs

Remote

$243k - $269k

Osmosis

Remote

Myshell

Remote

$105k - $150k

Blockdaemon

EMEA

$200k

alchemy

New York, NY, United States

$135k - $350k

Helius

Remote

$225k - $350k

Gemini

Remote

$172k - $215k

Kraken
$92k - $101k estimated
United States

Building the Future of Crypto 

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here.

As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken ProKraken NFT, and Kraken Futures.

Become a Krakenite and build the future of crypto!

Proof of work

The team

Join our Trading Technology team at the forefront of revolutionizing financial technology and help build the internet of money!

As a key contributor, you'll work independently, collaborating with stakeholders beyond formal management, ensuring the seamless operation, support, and security of our Core Trading production infrastructure. From monitoring environments to managing releases with Hashicorp Nomad and implementing robust metrics, alerts, and monitoring systems, you'll play a crucial role in the team's success. Your expertise in improving Developer Tooling, building Docker images, and managing CI pipelines will contribute to the automation of quality testing, while your analytical skills will be essential in identifying and mitigating potential risks of downtime. Join us in shaping the future of financial technology and be part of a team that has played a critical role in scaling Kraken's trading infrastructure globally.

This role is fully remote.  We are specifically looking for candidates in EST Timezone (+/-1) to cover current needs.

The opportunity

  • Work highly independently, with multiple stakeholders outside of the formal management structure

  • Responsible for the operation, support, and security of production infrastructure for Core Trading Services 

  • Monitor and support Staging and Production environments

  • Manage releases using Hashicorp Nomad 

  • Implement robust metrics, alerts and monitoring of Trading infrastructure

  • Improve Developer Tooling, help with building Docker images, manage our Continuous Integration (CI) pipelines for automating quality testing

  • Analyze potential risks of downtime and develop systems that will eliminate the issue

  • Support a fully distributed team operating across numerous timezones

Skills you should HODL

  • 4+ years of experience in a SRE role (DevOps, SRE, etc)

  • Experience with high performance, low latency distributed systems (particularly financial)

  • Experience with Hashicorp Consul, Nomad, Vault and its PKI features

  • Experience with monitoring/alerting (primarily with Prometheus/Grafana) and knowledge of best practices in the area

  • Experience with Bash, Python, YAML, Configuration and Secret Management

  • Experience with distributed systems and technologies - gRPC & Kafka

  • Experience configuring Continuous Integration (CI)

  • Understanding of Unix/Linux operating systems, shell scripting

  • Understand DNS, SSL/TLS, and how traffic on IP networks establishes end-to-end security and trust

  • Understanding of networking concepts such as TCP/IP and UDP

  • Experience in logging, monitoring, tracing e.g. Cloudwatch, Elasticsearch/Kibana (ELK)

Nice to haves

  • Familiar with Fix protocol

  • Experience with web sockets and Real-Time Market Data feeds

  • Experience with Terraform, Kubernetes and Helm Charts

  • Understanding of digital currency trading market

#LI-Remote #LI-GL1

This job is accepting ongoing applications and there is no application deadline.

Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution.

We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

Kraken is powered by people from around the world and we celebrate all Krakenites for their diverse talents, backgrounds, contributions and unique perspectives. We hire strictly based on merit, meaning we seek out the candidates with the right abilities, knowledge, and skills considered the most suitable for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgable about crypto!

As an equal opportunity employer, we don’t tolerate discrimination or harassment of any kind. Whether that’s based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status or any other protected characteristic as outlined by federal, state or local laws. 

Stay in the know

Follow us on Twitter

Learn on the Kraken Blog

Connect on LinkedIn

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.