Reliability Engineer

485 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Metaco

Bangalore, India

$105k - $109k

Triton

Malaysia

$72k - $75k

Ramp Network

Poland

$90k - $100k

Ripple

Bangalore, India

$90k - $115k

Luno

Cape Town, South Africa

$112k - $156k

Chainlink Labs

Remote

Aragon Association

Remote

$90k - $100k

Pintu

Remote

$90k - $100k

Web 3.0 Technologies Foundation

Zug, Switzerland

$112k - $131k

Kraken Digital Asset Exchange

Remote

$63k - $88k

Chainlink Labs

Remote

Pintu

Remote

$76k - $81k

Swan

Remote

$105k - $113k

Ava Labs

New York, NY, United States

$175k - $218k

Ava Labs

New York, NY, United States

$151k - $189k

Site Reliability Engineer

Metaco
$105k - $109k estimated
KA Bangalore, Karnataka, India
Join Talent Pool

This job is closed

THE WORK:

We are seeking a Site Reliability Engineer (SRE) to join our Team in India.

WHAT YOU’LL DO:

  • Keep your assigned site or service functioning or getting it back up and running quickly when failure occurs
  • Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch
  • Automate work including infrastructure needs, testing, failover solutions, failure mitigation, and software maintenance processes
  • Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing
  • Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents
  • Participate and establish best practices in Site Reliability Engineering
  • Manage code deployments, fixes, updates, and related processes
  • Work with a close-knit team and brainstorm on the best ways to solve complex problems in infrastructure, security and monitoring
  • Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!)

WHAT WE’RE LOOKING FOR:

  • 3+ years of experience with software engineering, software development, or system operations on high available and high traffic environments
  • Strong experience with Linux-based infrastructures, Linux/Unix administration, and Azure
  • Experience with databases such as PostgreSQL
  • Experience administering Linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment
  • Experience of scripting languages such as Python, Bash
  • Experience with message broker/queue technologies like RabbitMQ,
  • Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc
  • Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.)
  • Good understanding of cybersecurity fundamentals and best practices
  • Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.)
  • Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems
  • Excellent oral and written communication skills
  • Process-oriented with great documentation skills

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.