Reliability Engineer

474 jobs found

web3.career is now part of the Bondex Logo Bondex Ecosystem

Receive emails of Reliability Engineer
Job Position Company Posted Location Salary Tags

Alpaca

Remote

$112k - $144k

Zscaler

Remote

$119k - $170k

Triton One

Remote

$54k - $90k

Zscaler

Remote

$130k - $131k

Cryptio

United Kingdom

$133k - $135k

Zscaler

Remote

$119k - $170k

Zscaler

Remote

$164k - $235k

Zscaler

Remote

$119k - $170k

MoonPay

Madrid, Spain

$115k - $117k

xLabs

Buenos Aires, Argentina

$72k - $75k

Binance

Eastern Europe

Blockdaemon

Denmark

$90k - $145k

Scrollio

Remote

$133k - $135k

Chainlink Labs

United States

$98k - $112k

Zscaler

Remote

$112k - $156k

Alpaca
$112k - $144k estimated
Remote
Apply

Who We Are: Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more. Our recent Series C funding round brought our total investment to over $170 million, fueling our ambitious vision. Amongst our subsidiaries, Alpaca is a licensed financial services company, serving hundreds of financial institutions across 40 countries with our institutional-grade APIs. This includes broker-dealers, investment advisors, wealth managers, hedge funds, and crypto exchanges, totalling over 6 million brokerage accounts. Our global team is a diverse group of experienced engineers, traders, and brokerage professionals who are working to achieve our mission of opening financial services to everyone on the planet. We're deeply committed to open-source contributions and fostering a vibrant community, continuously enhancing our award-winning, developer-friendly API and the robust infrastructure behind it. Alpaca is proudly backed by top-tier global investors, including Portage Ventures, Spark Capital, Tribe Capital, Social Leverage, Horizons Ventures, Unbound, SBI Group, Derayah Financial, Elefund, and Y Combinator.   Our Team Members: We're a dynamic team of 230+ globally distributed members who thrive working from our favorite places around the world, with teammates spanning the USA, Canada, Japan, Hungary, Nigeria, Brazil, the UK, and beyond!We're searching for passionate individuals eager to contribute to Alpaca's rapid growth. If you align with our core values—Stay Curious, Have Empathy, and Be Accountable—and are ready to make a significant impact, we encourage you to apply.Your Role: As a Site Reliability Engineer (SRE) at Alpaca, you will ensure the reliability, scalability, and performance of our systems and services. You will work closely with development, operations and devops teams to build and maintain robust applications, ensuring they run smoothly and efficiently. This role requires a blend of software engineering and operations skills, with a strong ability to troubleshoot technical issues and resolve problems before they impact our users.   Things You Get To Do:

Triage difficult technical problems and implement solutions Improve our observability stack (monitoring, logging, profiling) Incident Management: Respond to and resolve incidents in a timely manner, conducting post-incident reviews to identify and implement improvements. Collaboration: Work closely with development teams to ensure new features and services are designed with reliability and scalability in mind. Capacity Planning: Monitor system capacity and performance, making recommendations and implementing changes to handle future growth.

Who you are (must-haves):

5+ years of experience in Site Reliability Engineering, Performance Engineering, or similar roles. 5+ years of experience with multi-terabyte scale PostgreSQL clusters. Proven track record of managing and maintaining large-scale, high-availability, and high-performance PostgreSQL database. Experience designing and implementing SLIs, SLOs, and SLAs for internal systems and databases. Experience with troubleshooting PostgreSQL performance problems and slow queries. Extensive experience with efficient schema design and efficient query design. Experience migrating multi-terabyte tables into more efficient schemas. Proficient with Go. Proficient with Prometheus. Proficient with Linux. Knowledgeable in trading/fintech domains. Experience with low-latency systems. Experience with distributed tracing. Experience scaling PostgreSQL clusters rapidly. Experience with pgx, gorm, or sqlc. How We Take Care of You:

Competitive Salary & Stock Options Health Benefits New Hire Home-Office Setup: One-time USD $500 Monthly Stipend: USD $150 per month via a Brex Card

Alpaca is proud to be an equal opportunity workplace dedicated to pursuing and hiring a diverse workforce. Recruitment Privacy Policy

What does Reliability Engineer do?

A Reliability Engineer is a professional who is responsible for ensuring the reliability and availability of systems and equipment in an organization

They use their knowledge of engineering principles, statistical analysis, and data science to identify and mitigate risks, prevent failures, and optimize system performance

Here are some of the typical tasks and responsibilities of a Reliability Engineer:

  1. Analyze data and perform statistical modeling: Reliability Engineers analyze data related to equipment performance, failure rates, and maintenance history to identify trends and patterns. They use statistical modeling to predict future failures and plan maintenance activities accordingly.
  2. Develop and implement reliability strategies: Reliability Engineers develop and implement strategies to improve the reliability and availability of equipment and systems. This may include performing root cause analysis, implementing preventive maintenance programs, and conducting failure mode and effects analysis (FMEA).
  3. Collaborate with other teams: Reliability Engineers collaborate with other teams such as operations, maintenance, and engineering to identify and address reliability issues. They may also work with suppliers to ensure the reliability of equipment and materials.
  4. Monitor and evaluate performance: Reliability Engineers monitor the performance of systems and equipment to identify areas for improvement. They use data to evaluate the effectiveness of reliability strategies and make adjustments as necessary.
  5. Provide technical support: Reliability Engineers provide technical support to other teams and stakeholders, answering questions and providing guidance on reliability-related issues.
  6. Continuously improve processes: Reliability Engineers are responsible for continuously improving reliability processes and methodologies. They stay up-to-date with the latest technologies and best practices in the field and identify opportunities for improvement.