Reliability Engineer

468 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Gemini

Gurgaon, India

$63k - $70k

NFT Now

Remote

$54k - $100k

Consensys

Remote

$72k - $100k

Consensys

Remote

$72k - $100k

Pinax

Canada

$54k - $100k

Binance

Taipei, Taiwan

Nethermind

London, United Kingdom

$90k - $100k

OKX

Singapore, Singapore

$105k - $120k

Coinmarketcap

Remote

$129k - $149k

OKX

Singapore, Singapore

$105k - $120k

bloXroute Labs

Tel Aviv, Israel

$77k - $87k

Solana US

United States

$120k

TRM Labs

Remote

$54k - $80k

TRM Labs

Remote

TRM Labs

Remote

$54k - $90k

Site Reliability Engineer

Gemini
$63k - $70k estimated

This job is closed

About the Company

Gemini is a global crypto and Web3 platform founded by Tyler Winklevoss and Cameron Winklevoss in 2014. Gemini offers a wide range of crypto products and services for individuals and institutions in over 70 countries.

Our flagship product, the Gemini Exchange, was built to be a compliant and secure platform to buy, sell, and store crypto. Our suite of retail products includes ActiveTrader, a high-performance platform for advanced traders. Gemini also offers the Gemini Credit Card providing real-time crypto rewards, the Gemini dollar (GUSD), a U.S. dollar-backed stablecoin, and Gemini Staking, allowing users to securely stake their tokens on-chain and receive rewards. Nifty Gateway, Gemini's NFT platform, is the world's premier marketplace for NFTs and digital art.

Gemini customers also have access to a wide range of institutional products tailor-made for high-net-worth individuals, asset and wealth managers, and hedge funds and liquidity providers seeking exposure to crypto. Customers looking to place large orders can use Gemini eOTC, a fully-electronic over-the-counter trading platform built for high-value bulk orders. For wealth management professionals, we offer a unique destination for their clients’ crypto portfolios from a single platform, and we enable fully electronic clearing and settlement of off-exchange crypto trades.

The Department:Platform

Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering teams to ensure all our systems are architected, engineered and deployed to be resilient, reliable and performant.

The Embedded SRE team is a part of Site Reliability Engineering with a focus on engaging directly with our other engineering teams to onboard them onto our platform systems, reviewing and recommending design and architectural decisions, and guiding our engineering teams on how to implement the tooling provided by the larger Platform organization required to ensure systems can scale and react to changing conditions, with continuous improvement loops.

The Role: Site Reliability Engineer

Gemini is where innovation, speed to delivery, and big ideas are celebrated. We are builders who are inventing the future of finance via crypto and web3-driven innovation. We are in the process of establishing our Gemini development center in Gurgaon (Gurugram), India with plans to begin recruitment by the end of April 2023.We are hiring software engineers and technical product managers in this location along with other key roles, including human resources and talent acquisition, finance, support, and compliance.

You will be an integral part of leading Gemini’s engineering teams towards modern DevOps practices, both by developing and providing modern automation and operational tooling, and working cross-functionally across Gemini’s engineering teams to influence and shape our development practices and culture.

Responsibilities:

  • Guiding engineering teams onto the various supported services provided by Platform
  • Running on-going performance evaluations and improvements for Gemini systems
  • Architecture recommendations and engagement as part of SDLC
  • Creating “Production-ready Scorecards” to evaluate the health of systems pre-launch
  • Implementing and teaching monitoring, alerting and automated resolution best practices
  • Defining SLIs, SLOs with Engineering teams
  • Educating and guiding Engineering teams on reliability and resiliency best practices, like statelessness, chaos, etc.
  • Building operational tooling and automations

Qualifications:

  • 7+ years using monitoring, alerting, and automation tooling to understand and remediate performance and health issues in systems at scale
  • Experience in a code-first environment, developing automated solutions to solve support and operational issues
  • Experience as a Technical Leader within a team, helping evaluating and making tech decisions for the team
  • Experience working with containerization such as Nomad, EKS (k8s), Docker, etc.
  • Experience working with Configuration Management such as Ansible, Chef, Puppet
  • Experience writing scripts or cli tools that help increase Developer Productivity
  • Experience in analyzing system and application performance, identifying bottlenecks, and recommending architectural or systemic improvements
  • Experience working with Engineering teams, teaching, training, and mentoring on how to implement best-practice technical solutions
  • Experience working in a code-drive, automation-first public cloud infrastructure
It Pays to Work Here
The compensation & benefits package for this role includes:
  • Competitive starting salary
  • A discretionary annual bonus
  • Equity

At Gemini, we strive to build diverse teams that reflect the people we want to empower through our products, and we are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. Equal Opportunity is the Law, and Gemini is proud to be an equal opportunity workplace. If you have a specific need that requires accommodation, please let a member of the People Team know.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.