Reliability Engineer

485 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

MoonPay

Remote

$90k - $100k

CoinGecko

Malaysia

$72k - $78k

Pintu

Setiabudi, Indonesia

$112k - $156k

Triton

United States

$72k - $75k

Triton

United States

$72k - $75k

Paxos

remote

$105k - $120k

Paxos

New York, NY, United States

$105k - $120k

Paxos

New York, NY, United States

$105k - $120k

Paxos

remote

$105k - $120k

Paxos

remote

$116k - $189k

SFOX

Denver, CO, United States

$72k - $90k

Improbable

London, United Kingdom

$106k - $165k

Ramp Network

Warsaw, Poland

$18k - $100k

Aptos

Palo Alto, CA, United States

$160k - $260k

MoonPay

remote

$90k - $100k

Staff Site Reliability Engineer

MoonPay
$90k - $100k estimated
Remote
Join Talent Pool

This job is closed

Staff Site Reliability Engineer

The Discipline ❤️

Site Reliability Engineering at MoonPay is responsible for providing a resilient, secure, production-ready platform that enables MoonPay to safely deploy applications and services in a self-serve, repeatable manner. We believe that SRE should support both our product delivery and operational teams by surfacing data from our production environment and driving meaningful change based upon what we learn from it.

MoonPay is seeking a dedicated Staff Site Reliability Engineer to lead the charge in enhancing our platform's resiliency. As a Staff SRE and technical leader, you will play a pivotal role in shaping the future of MoonPay’s production infrastructure and development platform.

Current Tech Stack 💻

Tech Stack:

  • Typescript as our programming language of choice
  • Node.js as our backend platform
  • TypeORM, TypeDI, TypeGraphQL and routing-controllers as our backend libraries
  • React and NextJS hosted on Vercel as our frontend
  • Google Cloud Platform to host our services
  • Postgres as our core database
  • Redis for caching
  • Bull to manage background tasks
  • DataDog for logging and monitoring
  • ArgoCD for continuous deployment on Kubernetes
  • GitHub to manage our source code
  • Jest to run our tests ✅

What you’ll do 👀

  • Architectural Excellence: Lead the design, implementation, and evolution of a resilient, secure, and production-ready platform that empowers MoonPay to safely deploy applications and services in a self-serve, repeatable manner.
  • Cutting-Edge Technologies: Identify and integrate new technologies within the JavaScript ecosystem to ensure MoonPay's tech stack scales seamlessly with our rapidly growing business demands.
  • Automation and Innovation: Develop and implement automated solutions that enhance reliability, streamline processes, and facilitate rapid recovery, contributing to MoonPay's commitment to operational excellence.
  • Performance and Uptime: Design and track essential metrics for site uptime and performance, driving high levels of visibility to ensure MoonPay's services consistently meet and exceed user expectations.
  • Cross-Functional Collaboration: Collaborate closely with various engineering functions, including Security, Data, and Engineering teams, to provide insights and ensure a cohesive approach to achieving our reliability goals.
  • Mentorship and Leadership: Mentor junior team members, foster a culture of learning and growth, and contribute to MoonPay's technology roadmap through strategic recommendations and planning objectives.

You should apply if ✅

Here are some key attributes and experiences that make you an ideal candidate for the Staff Site Reliability Engineer role:

  • Systems Excellence: Your deep systems administration skills, familiarity with containers and virtual machines, and proficiency in navigating a Linux terminal set you apart.
  • Startup & Tech Acumen: Your platform engineering or SRE experience at leading startups or fast-growing tech companies demonstrates your ability to excel in a dynamic, rapidly evolving environment.
  • Tech Enthusiast: Whether you're an expert in parts of our tech stack or confident in your ability to cross-train quickly, your passion for technology is unmistakable.
  • Regulated Industry Insight: Your experience in regulated industries equips you to excel in MoonPay's compliance-focused landscape.
  • Guiding Developers: Your ability to collaborate effectively with developers and provide guidance on monitoring and logging complex systems at scale showcases your expertise.
  • Project Mastery: Complex projects are your playground, showcasing your ability to innovate and conquer intricate challenges.
  • Collaborative Spirit: Your talent for working seamlessly with diverse teams, including Security, Data, and Engineering, reflects your commitment to collective success.
  • Reliability Ownership: You're ready to take ownership of MoonPay's reliability and recovery processes, shaping them into world-class systems.
  • Technical Insight: Your grasp of complex reliability structures, theories, principles, and best practices speaks to your commitment to technical excellence.
  • JavaScript Proficiency: Your experience with JavaScript codebases and frameworks, including TypeScript, Node.js, and React, positions you as a tech-savvy contributor.
  • Diversity Advocate: You share our belief that diversity drives innovation and are excited to contribute your unique perspective to our team.

Research has shown that women are less likely than men to apply for this role if they do not have solid experience in 100% of these areas. Please know that this list is indicative and that we would still love to hear from you even if you feel you only are a 75% match. Skills can be learnt, diversity cannot.

We promote a diverse and inclusive culture at MoonPay.

Logistics 🛠

Unfortunately, we are unable to offer visas of any kind at this time!

Our interview process takes place on Google Hangouts and tends to consist of the following stages:

  • Recruiter call (20-30 minutes)
  • Hiring Manager Screen (30-45 minutes)
  • System Design (45 minutes)
  • Technical Deep Dive (45 minutes)
  • Values Interview (30 minutes)

Please let us know if you require any accommodations for the interview process, and we’ll do our best to provide assistance.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.