Reliability Engineer

400 jobs found

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Scrollio

Remote

$133k - $135k

Zinnia

Remote

$126k - $127k

Anagram

Remote

$112k - $156k

Crypto Finance AG

Zurich, Switzerland

$80k - $115k

Zinnia

Remote

$126k - $127k

Ledger

Paris, France

$120k - $156k

Ledger

Paris, France

$185k

Crossmint

Spain

$87k - $145k

xLabs

Argentina

$72k - $100k

xLabs

Argentina

$72k - $100k

Shardeum Foundation

Remote

$90k - $112k

Launchpadtechnologiesinc

Remote

$103k - $117k

IO Global

Remote

$126k - $132k

Impossible Cloud

Hamburg, Germany

$103k - $117k

ZetaChain

Remote

Scrollio
$133k - $135k estimated
Remote
Apply

Scroll is a Layer 2 scaling solution for Ethereum, specifically focusing on zkRollups. Key aspects of Scroll are zkRollup technology, Scalability, Efficiency, Security, and Developer-friendly. Overall, Scroll plays a crucial role in addressing Ethereum's scalability challenges and facilitating the growth of decentralized finance (DeFi) and other blockchain-based applications by providing a scalable and efficient Layer 2 solution. Position Overview We are looking for a Senior or Staff Site Reliability Engineer to lead the design, implementation, and management of our infrastructure and development operations to ensure the best reliability, security, and scalability. You will work closely with our development team to build and maintain automated deployment pipelines, monitor and analyze system performance, and identify and resolve issues before they impact our users. You are expected to become SRE team lead after 3 month probation period. This is a dynamic role in a fast-paced blockchain environment, ideal for someone embracing ownership and autonomy to grow with us. Responsibilities

Platform Engineering & Developer Enablement

Design, build, and maintain internal developer tools to improve developer lifecycle, including building, testing, and deploying. Create tools that streamline developer workflows, including monitoring, logging, and debugging utilities.

Infrastructure & System Architecture

Design, provision, and maintain cloud environments focused on scalability, reliability, and security. Automate deployment and maintenance processes ensuring seamless integration and rapid iteration.

Reliability, Monitoring & Security

Implement observability solutions to gain actionable insights, enhance performance, and ensure high availability of blockchain services. Work closely with the security team to harden infrastructure and mitigate potential threats. Operate and maintain a fleet of hundreds of GPU-based zk provers. Track prover health, detect failures, and optimize performance in real time.

Requirements

5+ years of experience as a DevOps, Infrastructure, Site Reliability or Cloud Engineer 3+ years of experience as Backend Developer Familiarity with hybrid cloud environments (AWS, Azure, GCP, etc.) and the ability to design, provision, and maintain them securely and efficiently. Good at any modern programming language (Go, Rust, Python). You need to be a good programmer for custom tooling. Linux administration experience, from hardware optimizations to advanced OS-level configurations. Experience working with configuration management tools like Terraform and Ansible Experience working with containers and using them in production systems Self-motivated individual with enthusiasm for learning and building things Collaborative, communicative, and confident in their abilities to work well with all team members at all seniority and skill levels

Preferred Qualifications

Understand system architecture and business Previous experience as a platform engineer Previous experience as a tech lead Previous experience with Kubernetes, microservices, and GitOps tooling Previous experience in a blockchain company Previous experience in optimizing blockchain specific infrastructure

   What We Offer

Mission-Driven, Collaborative, and Innovative Environment: Join a team united by a shared vision, working with like-minded individuals and cutting-edge technology to advance Ethereum and blockchain innovation. Comprehensive Compensation and Remote Flexibility: Benefit from a competitive salary package and generous discretionary benefits, while enjoying the remote work from anywhere with flexible hours. Additionally, receive support for your workspace with a home office setup allowance and monthly co-working membership stipend. Remote Hiring: For team members outside the US, UK, Canada, and Hong Kong, we engage under an independent consulting arrangement, offering the flexibility of payment (in Fiat, USDC, or etc). Private Healthcare Benefits: Private healthcare benefits through the Employer of Record (EoR) are only available in the US, UK, Canada, and Hong Kong.

Scroll is proud to be an equal opportunity workplace. We are committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. If you have a disability or special need, please let us know and we'll do our best to accommodate.  

⬇
Apply Now

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.