Reliability Engineer

458 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Swan

Remote

$105k - $113k

Ava Labs

New York, NY, United States

$175k - $218k

Ava Labs

New York, NY, United States

$151k - $189k

MoonPay

Remote

$90k - $100k

CoinGecko

Malaysia

$72k - $78k

Pintu

Setiabudi, Indonesia

$112k - $156k

Triton

United States

$72k - $75k

Triton

United States

$72k - $75k

Paxos

remote

$105k - $120k

Paxos

New York, NY, United States

$105k - $120k

Paxos

New York, NY, United States

$105k - $120k

Paxos

remote

$105k - $120k

Paxos

remote

$116k - $189k

SFOX

Denver, CO, United States

$72k - $90k

Improbable

London, United Kingdom

$106k - $165k

Swan
$105k - $113k estimated
Remote

The Company

Swan is the leading education focused Bitcoin-only onramp for retail customers, high net worth individuals and corporations looking to save in Bitcoin for the long term. We hire passionate Bitcoiners who want to work with a self-motivated and fully distributed startup team.

The Role

In this position, you will work closely with our development team, the CTO, and cloud/infra engineers to develop and operate a robust and scalable platform to support Swan’s business lines. You’ll cover a wide range of activities, from day to day operations, error monitoring, and proactive communication, to engineering, bug fixes, and database analysis to improve performance of queries. While this position is focused on operational expertise, experience and desire to build software and systems will always be encouraged.

Skills and experience that will help you succeed:

  • Experience with Datadog or similar, setting up monitors, alerting systems, anomaly management and forecasting. A desire to drive a proactive approach to scalability.
  • Medium to advanced level understanding of Postgres databases, having dealt with databases at scale, understanding how to tweak parameters, optimize sql queries, and knowledge of AWS RDS in particular.
  • Excellent understanding of HA architectures built in AWS.
  • At least mid level knowledge of DNS, SSL, AWS networking, Docker, and ECS.
  • Working knowledge of security principles in the cloud and a familiarity with the AWS Well Architected Framework.
  • Cool under pressure, able to manage incidents involving multiple systems, communicate effectively internally and externally using tools like StatusPage and PagerDuty, marshal resources, and get things resolved, including writing blameless postmortems.
  • Comfortable in taking (very occasional) pager alerts during working hours and sometimes weekends (we generally try to avoid night time pager alerts, as we do have staff in Europe and can split pager duty across timezones). You will not be the only on-call staff, but you will be in charge of primary incident response and leadership and training of other developers in response and mitigation.

Here's a bit about our culture:

  • We’re a growing team: Fully distributed across the world, Slack and video conferencing are huge here.
  • We’re very flat: Leadership is desired and encouraged; we hire people who care about the product they are working on.
  • We’re Bitcoiners: We find solutions that encourage Bitcoin principles. Many of us pull double duty alongside our main job, producing content for Bitcoin newsletters, podcasts, social audio platforms, and YouTube shows, and spend some of the day on Twitter educating the masses. We love Bitcoin, and it comes through in our daily chats, meetings, and actions.

Join us, become a Swan!

£LI-REMOTE

 

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.