Reliability Engineer

412 jobs found

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Crypto.com

Hong Kong, Hong Kong

$185k

Zscaler

Remote

$161k - $230k

Wehrtyou

Remote

$200k - $275k

Wehrtyou

Remote

$150k - $250k

Scrollio

Remote

$133k - $135k

Blockchain

Remote

$120k - $144k

Aurosglobal

Remote

$87k - $87k

Coinbase

Remote

$186k - $218k

Auros

Remote

$87k - $87k

Chainlink Labs

United States

$126k - $135k

Argus Labs

Toronto, Canada

$90k - $145k

Avalabs

Remote

$85k - $107k

Kraken

European Union

$112k - $156k

Zinnia

Remote

$126k - $127k

Anagram

Remote

$112k - $156k

Crypto.com
$185k estimated
Hong Kong, Hong Kong SAR
Apply

Senior Software Engineer, Site Reliability Engineering

Hong Kong, Hong Kong SAR
Engineering – Engineering /
Hybrid

Apply for this job
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.

What you’ll be doing

    • Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
    • Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
    • Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
    • Lead and drive SRE initiatives to improve operation efficiencies
    • Represent the SRE team in system design reviews and operational readiness exercises for new and existing services

What you need

    • Experience coding in Ruby and/or Go
    • Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
    • Experience in designing, analyzing, and troubleshooting large-scale distributed systems
    • Curiosity about finding root causes in incidents and outages
    • Ability to develop alignment to cultivate relationships and driving impact
    • Mindset in designing fault tolerance system architecture
    • Comfort with being uncomfortable in ambiguous situations
    • Involvement with incident management and response
    • Desire to grow expertise, inform, and educate others
    • Capable to pick up various technologies, a fast learner and have a “get things done” mentality
    • Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities

Desirable

    • Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
    • Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
    • Familiar with CICD tools (e.g. Github Actions)
    • Familiar with IaC tools (e.g. Terraform, Spacelift)
    • Experience in designing resilient system architecture
    • Experience in optimizing performance of large-scale production system
Life @ Crypto.com

Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining

Are you ready to kickstart your future with us?

Benefits

Competitive salary
Attractive annual leave entitlement including: birthday, work anniversary
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining

Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.


About Crypto.com:

Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet™. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.

Learn more at https://crypto.com.

Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.

Personal data provided by applicants will be used for recruitment purposes only.

Please note that only shortlisted candidates will be contacted.
Apply for this job
⬇
Apply Now

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.