Reliability Engineer
420 jobs found
Job Position | Company | Posted | Location | Salary | Tags |
---|---|---|---|---|---|
Valigator | Remote | $50k - $90k | |||
Layerzerolabs | Remote | $86k - $110k | |||
Scrollio | Remote | $133k - $135k | |||
Gsrmarkets | Remote | $80k - $100k | |||
Learn job-ready web3 skills on your schedule with 1-on-1 support & get a job, or your money back. | | by Metana Bootcamp Info | |||
Blockchain | Remote | $120k - $144k | |||
Argus Labs | Jakarta, Indonesia | $90k - $145k | |||
Argus Labs | San Francisco, CA, United States | $90k - $145k | |||
Argus Labs | San Francisco, CA, United States | $90k - $145k | |||
Zscaler | Remote | $126k - $193k | |||
Zscaler | Remote | $112k - $156k | |||
Crypto.com | Hong Kong, Hong Kong | $185k | |||
Zscaler | Remote | $161k - $230k | |||
Wehrtyou | Remote | $200k - $275k | |||
Wehrtyou | Remote | $150k - $250k | |||
Aurosglobal | Remote | $87k - $87k |
Description
Are you excited to work on the cutting edge of blockchain infrastructure? Valigator is looking for an entry-level Site Reliability Engineer (SRE) to help shape the foundation of our growing platform. This role is ideal for recent graduates or engineers with up to one year of experience who are eager to dive into real-world systems at scale.
As an early team member, you’ll work directly with the founder to design and implement systems that ensure scalability, reliability, and performance across our infrastructure. You’ll have the opportunity to define processes, contribute to best practices, and build the operational backbone that supports our high-performance services on the Solana blockchain. If you’re motivated to learn, solve complex problems, and make a meaningful impact from day one, we want to hear from you.
Company
Valigator is a high-performance infrastructure service provider operating on the Solana blockchain. As an early-stage startup, we move fast, build with intention, and thrive on solving hard problems. We are committed to delivering reliable, secure, and efficient services that maximize performance and rewards for our clients. Our reputation is built on trust, technical excellence, and a relentless focus on uptime and optimization.
Requirements
In this role, you will:
- Take ownership of our production bare-metal infrastructure by implementing scalable, sustainable solutions that improve system availability and performance.
- Investigate and identify root causes of incidents, and contribute to long-term fixes that prevent recurrence.
- Develop and refine alerting systems that focus on symptoms, not just outages — ensuring alerts are actionable and ideally paired with self-healing automation.
- Document operational tasks clearly, turning one-off actions into repeatable processes and automated workflows.
- Focus on the observability, reliability, availability, and performance of systems, with a strong emphasis on proactive monitoring and optimization.
- Take part in periodic on-call rotations with guidance and support from the team.
What We’re Looking For
We’re seeking a motivated and curious individual with a foundational understanding of systems and a desire to grow. You don’t need to meet every qualification below, but experience in some of these areas will help you succeed in the role:
- Some hands-on experience in DevOps, SRE, System Administration, or a related academic background (degree or certification)
- Familiarity with scripting languages such as Python, JavaScript, or Bash
- Exposure to alerting and monitoring tools like DataDog, Zabbix, Nagios, or similar
- Comfort working in Linux environments, including using the command line for system-level tasks
- Basic experience with configuration management tools, preferably Ansible
- Strong analytical and troubleshooting skills, with the ability to break down complex problems
- Clear verbal and written communication skills, and a collaborative, team-oriented mindset
- The ability to zoom into technical detail while maintaining awareness of broader system goals
Bonus Points For
These aren’t required, but experience in any of the following areas will help you stand out:
- Familiarity with blockchain technologies — especially hands-on experience with Solana
- Background in developer enablement, DevRel (Developer Relations), or technical community engagement
- Contributions to open-source projects or participation in technical communities
Benefits
This is a contract position with the following compensation structure:
- Base salary equivalent to $50,000–$90,000 USD annually, depending on experience and qualifications. Payment
- Annual performance-based bonus of 25% of base salary, contingent on individual performance and overall company results
What does Reliability Engineer do?
A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization
They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance
Here are some of the typical tasks and responsibilities of a Reliability Engineer:
- Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
- Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
- Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
- Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
- Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
- Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.