Reliability Engineer

474 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Binance

Dublin, Ireland

Layerzerolabs

Remote

$86k - $110k

Gsrmarkets

Remote

$80k - $100k

Autumn Compass

Sydney, Australia

$120k - $150k

Chainlink Labs

United States

$98k - $112k

Wormholefoundation

Remote

$112k - $156k

Kiln

Paris, France

$112k - $156k

Kiln

Paris, France

$84k - $112k

Zscaler

Remote

$140k - $200k

Zscaler

Remote

$130k - $131k

Zscaler

Remote

$115k - $165k

Zenith

Remote

Kraken

United States

$127k - $203k

Alpaca

Remote

$120k - $149k

InfStones

Texas

$36k - $54k

Binance
Ireland, Dublin
Apply

Senior Site Reliability Engineer (Node.js & QA), Trading Technologies

Argentina, Buenos Aires / Ireland, Dublin / Czech Republic, Prague / Croatia, Zagreb / Estonia, Tallinn / Greece, Athens / Hungary, Budapest / Riga, Latvia / Moldova, Chișinău / Portugal, Lisbon / Romania, Iași / Romania, Bucharest / Türkiye, Ankara / Türkiye, Istanbul / Bulgaria, Sofia
Engineering – DevOps/SRE /
Full-time: Remote /
Remote

Apply for this job
Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.

About the Role

We are looking for a Senior Site Reliability Engineer (SRE) with solid Node.js experience to help maintain and improve our internal testing and validation systems. You will work on the reliability and performance of our distributed test environment, which supports web, API, and Android testing.

A key part of the role is to build and refine our internal monitoring and alerting, making sure our high-load, real-time test runs stay stable and easy to understand when issues occur. This includes improving how we track failures, measure system health, and respond to problems across the different components of the test stack.

If you enjoy working with distributed systems, improving tooling, and making test environments easier to operate and debug, this role will suit you well.

Responsibilities

    • Maintain, operate, and improve distributed test execution systems across web, API, and Android platforms.
    • Ensure the reliability, performance, and scalability of test environments including Selenium grids, Appium setups, VPN containers, and supporting microservices.
    • Expand testing capabilities by enabling new jurisdictions, product flows, and environment configurations.
    • Develop and refine internal tooling, diagnostics, and automation to reduce operational overhead and improve system visibility.
    • Troubleshoot complex issues across distributed systems, containers, networked services, mobile emulators, and API integrations.
    • Support environment provisioning, CI/CD integration, and orchestration of test execution pipelines.

Requirements

    • 5+ years of professional experience with Node.js, ideally in SRE, DevOps, test automation, or platform engineering.
    • Strong understanding of software engineering fundamentals and testing methodologies.
    • Hands-on experience with modern testing frameworks such as Playwright, Puppeteer, or WebDriver/WDIO.
    • Experience with Android development or mobile automation (Appium, emulators, ADB, debugging).
    • Strong analytical and troubleshooting skills across distributed and containerized systems.
    • Ability to design, maintain, and debug reliable test environments across local, staging, and production-like setups.

Bonus Skills

    • Experience with Docker and Kubernetes for orchestrating multi-service test stacks.
    • Experience with time-series databases such as ClickHouse for metrics, analytics, or results storage.
    • Familiarity with observability tooling (Prometheus, Grafana, OpenTelemetry).
    • Background in CI/CD pipelines, cluster deployments, or environment automation.
    • Understanding of VPN routing, network debugging, and environment-dependent test scenarios.
Why Binance
• Shape the future with the world’s leading blockchain ecosystem
• Collaborate with world-class talent in a user-centric global organization with a flat structure
• Tackle unique, fast-paced projects with autonomy in an innovative environment
• Thrive in a results-driven workplace with opportunities for career growth and continuous learning
• Competitive salary and company benefits
• Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)

Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.
By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Apply for this job
⬇
Apply Now

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.