Reliability Engineer

458 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

CoinGecko

Selangor, Malaysia

$126k - $131k

Alchemy

Remote

$135k - $240k

Blockdaemon

New York, NY, United States

$90k - $145k

Blockdaemon

India

$90k - $145k

Dfinity

Remote

$122k - $141k

Seedify

Remote

$36k - $48k

Gsrmarkets

Remote

$80k - $100k

Chainlink Labs

San Francisco, CA, United States

$98k - $112k

Coinbase

Remote

$186k - $218k

Zscaler

Remote

$115k - $165k

Tenderly

Remote

Avalabs

Remote

$90k - $100k

CleanSpark

Las Vegas, NV, United States

$90k - $110k

Zamp

Bangalore, India

$77k - $85k

CoinGecko
$126k - $131k estimated
Selangor
Apply

Senior Site Reliability Engineer, Compliance (L3)

Selangor
Engineering – Site Reliability Engineering /
Full-time /
Remote

Apply for this job
CoinGecko is a global leader in tracking cryptocurrency data. Operating since 2014, CoinGecko has built the world's largest cryptocurrency data platform, tracking over 10,000 tokens across more than 400 exchanges, serving over 300 million page views in more than 100 countries. We are proud to have played a major part in mainstream awareness, adoption, and education of cryptocurrency globally.

We at CoinGecko believe that cryptocurrency and blockchain will define the future of finance, bringing greater financial and economic freedom around the world. In anticipation of that future, CoinGecko is building the foundation to scale cryptocurrency market data to serve billions.

We practice transparent salaries and a level structure at CoinGecko:
• The salary range for the L3 position is RM16,407 - RM18,047.
• For more junior candidates, we may evaluate you as a Mid-Level (L2), with a salary range of RM11,702 - RM12,872.
• Learn more about our level structure at CoinGecko's Career Progression. 

Job Responsibilities

    • System Architecture: Review architecture and software components with software engineers. Ensure best practices are consistent across all teams.
    • Operational Excellence: Own and ensure SLOs and SLAs are met. Monitor operational metrics and lead improvement plans. Develop and maintain tools including infra-as-code resources to scale operations and allow other teams to be autonomous.
    • Security and Compliance: Manage and audit security controls to meet enterprise requirements. Implement and maintain best practices and compliance standards. Collaborate with legal and compliance to assess overall risk management.
    • Release Planning: Lead strategic release plans (e.g., canary or blue-green deployments) to reduce blast radius and allow for faster reversal during release failures. Work closely with developers for pre-release requirements including provisioning test environments. Conduct ad hoc performance tests based on requirements.
    • Incident Management: Lead incident response and post-mortems to resolve production issues, identify root-causes and prevent future occurrences.
    • Disaster Recovery: Develop and implement DR plans and procedures, including data recovery and fault injection simulations on production replica. 
    • Daily Operations: Perform and improve day-to-day tasks including access onboarding-offboarding, config and patch management etc. Plan capacity to ensure our systems have sufficient capacity to handle peak demand while optimizing cost.
    • Documentation: Develop and extend runbooks, documentation and other technical assets. Support periodic technical audits as required. 
    • Sharpen the Saw: Stay up-to-date with emerging trends and technologies in software development and contribute to knowledge sharing. Learn advanced architecture standards and new tools that improve the team’s code base and productivity. Demonstrate thorough understanding of a subject matter and how to apply it effectively.
    • Team player: Collaborating with cross-functional teams to ensure smooth deployment and operation of software releases. Answer technical questions from other teams or outside the organization.
    • Coaching: Provide feedback on the performance of junior staff and participate in people development initiatives.
    • Support any ad hoc tasks as required by the company.

Job Requirements

    • Proven track record: 3 to 5 years in managing software deployments and instrumentation in production environments with defined SLAs and SLOs. Strong knowledge of software delivery and devops principles.
    • Cloud Operations: Experience with cloud platforms (e.g., AWS, CloudFlare, GCP) and infrastructure-as-code tools (e.g., Terraform, CloudFormation). Strong programming and scripting skills, preferably in languages such as Python, Go, or Ruby. 
    • Accreditation: Bachelor’s degree in Comp Sci., InfoSec or similar fields, or professional certificates e.g. Certified DevOps Professional, Certified Solutions Architect Professional in AWS or GCP. 
    • Scope of Work: Fully capable of taking substantial features from concept to shipping as a sole contributor. Works effectively in open-ended projects and is self-sufficient to deep dive and evaluate multiple solutions to a problem.
    • Problem Solving: Solve hard problems with many constraints, using sound judgment to assess risks and present arguments in a well-structured, data-backed, written narrative. Have passion, creativity and empathy for users.
    • Quick Thinking: Able to derive information, think critically and make snap judgements based on measured data in high pressure situations.
    • People Skills: Strong communicator who is able to build positive working relationships between teams and form relationships with key customers. You must have experience supporting on-call rotations for 24x7 services to troubleshoot, perform runbooks or escalate incidents.
    • Nice to have:
    • Experience working in a growth stage startup.
    • Experience building applications in different tech stacks.
    • Keen interest in decentralized technologies and its applications including cryptocurrencies.
Perks at CoinGecko:
• Remote Work Flexibility: Work wherever you feel most productive. We also provide office space in 1Powerhouse (Malaysia) and WeWork (Singapore) if you ever feel like meeting your colleagues in person.
• Flexible Working Hours: No 9-5 structure, work the hours you need to get your tasks done.
• Comprehensive Insurance Coverage: We provide life and hospitalization coverage for you, along with hospitalization coverage for your dependents.
• Virtual Share Options: You'll be entitled to virtual options, with terms and conditions.
• Bonus: You’ll be entitled to a bonus, with terms and conditions.
• Parking Allowance: You will be given a monthly fixed allowance of RM 150 or SGD 100 to ease the cost of traveling.
• Meal Allowance: You will be given a monthly fixed allowance of RM600 or SGD400 to subsidize the cost of your meals.
• Learning Allowance: You will be allocated an annual budget of USD500 (claim basis) to help you continuously learn in the pursuit of your professional and personal development.
• Social Activity Allowance: Want to set a date to watch a movie or play futsal with your colleagues? Get it organized and we subsidize a portion (claim basis) of the cost.
• Annual Company Offsite: We gather once a year to meet each other in person, reflect on the year, and partake in social activities!

CoinGecko is an equal employment opportunity employer. Qualified candidates are considered for employment without regard to race, religion, gender, gender identity, sexual orientation, national origin, age, military or veteran status, disability, or any other characteristic protected by applicable law.

Interested in being a Gecko? Hit the apply button to get started on your application!

Apply for this job

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.