| Job Position | Company | Posted | Location | Salary | Tags |
|---|---|---|---|---|---|
IO Global | Remote | $126k - $132k | |||
Impossible Cloud | Hamburg, Germany | $103k - $117k | |||
ZetaChain | Remote |
| |||
xLabs | Buenos Aires, Argentina | $72k - $100k | |||
| Learn job-ready web3 skills on your schedule with 1-on-1 support & get a job, or your money back. | | by Metana Bootcamp Info | |||
Ledger | Paris, France | $94k - $148k | |||
Launchpadtechnologiesinc | Remote | $185k | |||
Fireblocks | Get a Fireblocks Platform Demo | $98k - $150k | |||
Ledger | Paris, France | $120k - $156k | |||
Gemini | Remote | $172k - $215k | |||
Nethermind | Remote | $112k - $156k | |||
Fmr | Bangalore, India | $105k - $120k | |||
Coinbase | Remote | $211k - $249k | |||
Alchemy | Bucharest, Romania | $80k - $85k | |||
Bitso | Latin America | $112k - $156k | |||
Bitso | European Economic Area | $112k - $156k |
Who are we?
IOHK, is a technology company focused on Blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability, and sustainability. Our projects include decentralized finance (DeFi), governance, and identity management, aiming to advance the capabilities and adoption of blockchain technology globally.
We invest in the unknown, applying our curiosity and desire for positive change to everything we do. By fueling creativity, innovation, and progress within our teams, our products and services are designed for people to be fearless, to be changemakers.
What the role involves:
As a Senior Site Reliability Engineer (SRE) you are an integral part of our open-source project, ensuring the reliability, availability, and performance of our production systems. This role combines service operation, systems engineering and software engineering principles to operate and monitor services as well as create or maintain tools, automations, and infrastructure code that bolster the efficiency and resilience of our platform. As a senior member of the team, you will lead initiatives, mentor junior engineers, and drive strategic improvements across the organization.
- Breakdown large tasks into manageable work packages and ensure timely project completion.
- Coach and mentor more junior engineers, fostering a collaborative team environment.
- Take ownership of projects in the area of responsibility to ensure prompt delivery, leading cross-functional initiatives.
- Design, write, and deliver tools and software primarily using Python, Bash, Terraform or Nix to improve the availability, scalability, and efficiency of our services.
- Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement.
- Practice sustainable incident response and promote blameless postmortems.
- Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind.
- Analyze system performance and reliability, offering recommendations for enhancement.
- Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services.
- Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges
Who you are:
- Minimum of 10 years experience in DevOps, 3 years minimum in blockchain
- Proficiency in Python, Bash, Terraform, Nix for DevOps services.
- Excellent understanding of and experience with infrastructure as code (e.g. Terraform, Helm).
- Extensive experience with AWS, specifically with services like EKS and RDS.
- Familiarity with Container orchestration (e.g. Kubernetes) is essential.
- Hands-on experience with PostgreSQL and its deployment on RDS.
- Knowledge of monitoring tools (e.g., Prometheus, Grafana, Loki).
- Solid troubleshooting and performance tuning capabilities.
- Understanding of the needs of real-time critical systems
- Exceptional communication skills and team collaboration ethic.
- Experience with CI/CD (e.g. Github Actions, Hydra, Earthly).
- Strong analytical and troubleshooting skills.
- Excellent communication skills to collaborate with development teams, operations, and other stakeholders.
- Hands-on DevOps experience using Infrastructure as Code, CI/CD and infrastructure automation.
- Familiar with on-prem and cloud infrastructure and multi-tenant application deployment.
- Ability to quickly learn new technologies and adapt to changing environments.
- High attention to detail to ensure system reliability and performance.
Are you an IOGer?
Do you find yourself questioning the status quo? Do you tinker with ideas and long to turn those ideas into solutions? Are you able to spark thoughtful debates, bringing out the inquisitiveness in others? Does the promise of continuously growing excite you? Then get ready to reimagine everything you thought wasn’t possible because that’s what it means to be an IOGer - we don’t set limits, we break them.
- Remote work
- Laptop reimbursement
- New starter package to buy hardware essentials (headphones, monitor, etc)
- Learning & Development opportunities
- Competitive PTO
At IOG, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
What does Reliability Engineer do?
A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization
They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance
Here are some of the typical tasks and responsibilities of a Reliability Engineer:
- Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
- Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
- Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
- Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
- Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
- Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.