Reliability Engineer

458 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Launchpadtechnologiesinc

Remote

$185k

Fireblocks

Get a Fireblocks Platform Demo

$98k - $150k

Ledger

Paris, France

$120k - $156k

Gemini

Remote

$172k - $215k

Nethermind

Remote

$112k - $156k

Fmr

Bangalore, India

$105k - $120k

Coinbase

Remote

$211k - $249k

Alchemy

Bucharest, Romania

$80k - $85k

Bitso

Latin America

$112k - $156k

Bitso

European Economic Area

$112k - $156k

Kraken

United States

$92k - $101k

Asymmetric Research

Remote

$105k - $180k

Limit Break

Tokyo, Japan

$90k - $145k

Asymmetric Research

Remote

$105k - $180k

Gemini

Remote

$136k - $170k

Launchpad, a people-first technology company, is a leader in North America´s rapidly growing tech sector. Through two solutions, Launchpad supports its clients with digital transformation:

PaasportTM, our iPaaS solution, streamlines software integration and automates workflows. Nearshore Staff Augmentation, our managed IT staffing service, connects top IT talent across various geographical regions, bringing industry expertise to leading clients.

Based in Vancouver, Canada, our operational footprint spans across North and South America, with a second headquarters in Santiago, Chile. In 2023, our unwavering dedication to innovation garnered recognition as a Deloitte Technology Fast 50™ Program Company. Our clientele boasts industry leaders such as Walmart, GM, TIME Magazine, Salesforce, Tableau, Splunk, Bolt.com, Freedom House, and more. At Launchpad, we genuinely care about our people as individuals. If you are looking for a team that values growth, drive, and passion for your craft, if you’re seeking a place to achieve your goals and dreams with fairness and integrity, then we’d love to hear from you.About the Role We are seeking a Senior Site Reliability Engineer (SRE) to play a pivotal role in ensuring the reliability, scalability, and performance of our infrastructure. This is a mission-critical role, requiring someone who can address both external product reliability and internal platform demands while contributing strategically to organizational objectives. You will balance hands-on technical work with leadership in reliability initiatives, driving improvements across our platform and collaborating with stakeholders at all levels. This position is crucial to maintaining operational excellence as we navigate complex compliance standards and evolving business needs.

Responsibilities

Develop, maintain, and improve our automated deployment, certification, and validation pipelines. Define, implement, and monitor service level objectives (SLOs), service level agreements (SLAs), and service level indicators (SLIs). Lead efforts to optimize, improve, and maintain the reliability and performance of the SaaS platform. Manage third-party services and technologies used to support the SRE discipline. Collaborate with senior management and the engineering team to lead SRE initiatives and provide updates Define and implement an observability framework to provide insights into system performance and behavior. Implement proactive monitors and alerts to ensure system reliability and performance meet customer expectations. Own operational incident management, providing support to related teams and individuals during incident resolution. Identify and implement best practices for system reliability, security, scalability, and performance. Participate in on-call rotations for system support, troubleshooting, and resolution. Conduct post-mortem reviews of incidents, identify root cause, and implement remediation steps. Develop and maintain documentation for systems, processes, and procedures.

  Requirements

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent work experience. Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or similar roles. Familiarity with monitoring tools and systems. Proficient in scripting languages such as Python, Bash, or Ruby. Experience with infrastructure automation tools such as Terraform, Ansible, or Chef. Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes. Strong knowledge of cloud platforms such as AWS, GCP, or Azure. Excellent troubleshooting and analytical skills. Strong communication skills and the ability to work effectively within a team.

  Nice to Have

Certifications in AWS, GCP, or Azure. Experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI. Familiarity with database technologies, both SQL and NoSQL.

Why work for Launchpad?

100% remote People first culture Excellent compensation in US Dollars Hardware setup for working from home Work with global teams and prominent brands based in North America, Europe, and Asia Training allowances Personal time off (PTO) for vacations, study leave, personal time, etc. ...and more! At Launchpad, we genuinely care about our people as individuals. If you are looking for a team that values growth, drive, and passion for your craft, if you’re seeking a place to achieve your goals and dreams with fairness and integrity, then you are the future of Launchpad. Launchpad is committed to fostering a diverse and representative workforce and an inclusive work environment where all employees are respected and treated equally. Are you ready to elevate your career at Launchpad? We want to hear your story! Contact us today.  

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.