Reliability Engineer

458 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

alchemy

New York, NY, United States

$135k - $350k

Helius

Remote

$225k - $350k

Gemini

Remote

$172k - $215k

Crypto.com

Shenzhen, China

$185k

Alchemy

Remote

$135k - $350k

Uniswap Labs

New York, NY, United States

$243k - $269k

Genies, Inc.

San Mateo, CA, United States

$160k - $190k

SwissBorg

Krakow, Poland

$112k - $156k

Flow Traders

Amsterdam, Netherlands

$77k - $106k

Bitso

Latin America

$112k - $156k

Ava Labs

New York, NY, United States

$89k - $94k

Circle

Chicago, IL, United States

$147k - $195k

Circle

Toronto, Canada

$103k - $117k

Circle

Chicago, IL, United States

$147k - $195k

Circle

Los Angeles, CA, United States

$147k - $195k

alchemy
$135k - $350k
New York, New York, United States, San Francisco, California, United States

Our mission is to bring blockchain to a billion people. The Alchemy Platform is a world class developer platform designed to make building on the blockchain easy. We've built leading infrastructure in the space, powering over $105 billion in transactions for tens of millions of users in 99% of countries worldwide.   The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leadership roles at leading companies and universities like Google, Microsoft, Facebook, Stanford, and MIT.   Alchemy recently raised a Series C1 at a $10.2B valuation led by Lightspeed and Silver Lake. Previously, Alchemy raised from a16z, Coatue, Addition, Stanford University, Coinbase, the Chairman of Google, Charles Schwab, and the founders and executives of leading organizations.   Alchemy powers the top blockchain companies globally and has been featured in TechCrunch, Forbes, Bloomberg, and elsewhere.The Role As an engineer in the Infrastructure department at Alchemy, you will collaborate with our engineering team to design, deploy, and continuously improve the infrastructure supporting our globally used developer platform. Your focus will be on enhancing developer productivity and ensuring product reliability as we scale. The Infrastructure team’s mission is to provide the infrastructure, tooling and expertise needed to allow Alchemy engineers to ship, scale and operate high quality products to our customers in a fast, safe and cost efficient manner. Come and help us build, maintain and scale the underlying infrastructure that is required to build products that delight our customers when it comes to reliability, latency and cost. What You'll Do:

Set high standards for Reliability at Alchemy Develop and own company wide Reliability best practices like SLO definition, incident management, postmortem reviews, launch readiness reviews, change management Architect production infrastructure and tools that encourage and enforces high reliability Inspire the broader engineering organization to ensure Reliability is a first class citizen in the products we build Collaborate, partner, advice, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management Improve critical infrastructure and systems that are used to operate infrastructure at scale (i.e. compute, networking, deployment, observability, code tooling/libraries etc.) Develop and own best practices for managing production infrastructure: provisioning, application scaling, configuration management, capacity planning, monitoring, etc. Develop and own best practices for developer processes: CI/CD, dev and staging environments, etc. Provide input into long-term platform requirements and operational guidelines with a focus on reliability Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment Build and maintain documentation around process and workflows

What We're Looking For:

6+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer) Experience leading and driving company wide reliability efforts and engineering initiatives Experience with observability best practices and tooling like Prometheus, Grafana and Datadog Experience designing and operating large-scale, multi-region production systems Experience working with AWS or other cloud infrastructures Experience with container schedules and runtimes such as Docker and Kubernetes Experience building deployment pipelines leveraging common CI/CD tools (e.g. Argo, Flux, Gitops) Experience with Infrastructure-as-Code (e.g. Terraform, Pulumi, Chef, Puppet, etc) The cross-functional nature of this role requires strong communication and collaborations skills (Preferred) Experience with running production services on bare-metal (Preferred) Experience with Typescript and Python (Preferred) Excellent understanding of web applications and architecture

More on The Role Alchemy is committed to offering competitive compensation, including base salary as well as equity. Additionally, Alchemy offers comprehensive medical, dental, and vision coverage, as well as other benefits such as 401k and unlimited flexible time off. The base salary range for this position is estimated to be between $135,000 - $350,000 annually. Please note this range reflects base salary only, and does not include bonus, equity, or benefits. Your salary will be determined by various factors, including relevant experience, skill set, qualifications, and other business needs.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.