Reliability Engineer

475 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

D3

Remote

$112k - $156k

Layerzerolabs

Remote

$86k - $110k

Gsrmarkets

Remote

$80k - $100k

Binance

Dublin, Ireland

Autumn Compass

Sydney, Australia

$120k - $150k

Chainlink Labs

United States

$98k - $112k

Wormholefoundation

Remote

$112k - $156k

Kiln

Paris, France

$112k - $156k

Kiln

Paris, France

$84k - $112k

Zscaler

Remote

$140k - $200k

Zscaler

Remote

$130k - $131k

Zscaler

Remote

$115k - $165k

Zenith

Remote

Kraken

United States

$127k - $203k

Alpaca

Remote

$120k - $149k

D3
$112k - $156k estimated
Remote
Apply

About D3:  D3 is building the world’s first purpose-built blockchain for DomainFi—bringing domain tokenization and DeFi primitives to a massive, rapidly growing $350B+ real-world asset class. We’re revolutionizing how existing and future domain names are owned, traded, and leveraged in the digital economy. Our elite team is stacked with industry veterans who have spent the last three decades shaping the internet, from pioneering domain name monetization to architecting key internet protocols to launching and running major TLDs like .xyz, .inc, .tv, and .link. With a proven track record of innovation and success, we’re now redefining what’s possible in the domain space. We recently closed a $25M Series A led by Paradigm - one of the best investors in the industry. This will help fuel our mission to bring domains fully on-chain and unlock new financial possibilities for one of the internet’s most valuable asset classes. We’re based in Los Angeles, with team members all over the world. We’re looking for driven, talented builders to help build a trillion-dollar DomainFi economy. Join us! Job Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with strong DevOps expertise to join our team. In this role, you will be responsible for maintaining the reliability, performance, and security of our systems and infrastructure. You will work closely with Development and Product teams to ensure high system uptime and support the business’s growth by proactively identifying and solving operational challenges. This role requires a hands-on, detail-oriented engineer who is comfortable troubleshooting complex issues and implementing robust solutions. You must also be willing to provide on-call support up to 12 hours per day until additional team members are hired. Key Responsibilities: 

Maintain, troubleshoot, and optimize Kubernetes environments using HELM and kubectl.  Perform system-level troubleshooting and administration on Linux-based systems.  Manage and troubleshoot networking, DNS, routing, and VPN configurations.  Configure and manage firewall rules across cloud and on-premises environments.  Debug and resolve infrastructure and application issues with strong analytical skills.  Monitor system performance and reliability by implementing and managing monitoring solutions.  Identify and address database performance issues, including query optimization and caching improvements.  Continuously improve security procedures to prevent downtime and data loss.  Collaborate closely with Development and Product teams to support system scalability and uptime. 

Qualifications: 

3+ years of experience as a SRE or DevOps Engineer.  3+ years of experience with Kubernetes (and tools like HELM, kubctl, etc.) Experience with Linux administration and troubleshooting Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)

Preferred Qualities: 

Exceptional attention to detail, especially in system configurations and deployment processes.  Willingness to dig into a sometimes poorly defined problem to fully understand.  Ability to propose solutions and implement those solutions in order to move the business forward.  Work with Development and Product to ensure high uptime. 

  Why D3, Why Now?  Ground-Floor Growth, Learning, and Impact: D3 is your chance to dive headfirst into an ultra-early-stage company where every move you make truly matters. You’ll have the opportunity to sharpen your skills, expand your expertise, and shape the foundation of something groundbreaking. Almost everything we’re building today at D3 is “zero-to-one,” meaning you’ll be among the first to craft, refine, and launch key initiatives that define our future success. Strong, Proven Leadership: At D3, you’ll work alongside industry visionaries who have been there, done that, and are ready to do it again—only bigger. Our leadership team brings veteran industry experience, sharp insights, and a relentless drive to do big things across every function at D3. You’ll gain invaluable mentorship, develop a high-impact mindset, and be challenged to grow in ways you never imagined. Unique Market Positioning – We’re pioneering at the intersection of internet infrastructure, real-world assets, and blockchain communities, creating solutions that redefine what’s possible in Web3. If you want to push boundaries, solve complex problems, and be part of a team that’s shaping the future of the Internet, D3 is the place to do it.

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.