Prime Trust is hiring a Web3 Senior Site Reliability Engineer
Compensation: $105k - $120k estimated
Location: NV Las Vegas, Nevada, United States
Job Summary
The SRE team at Prime Trust is responsible for partnering with the DevOps, Release Engineering, and Engineering organization to improve site reliability and availability. We are growing the team and looking for an experienced and talented Sr. SRE Engineer to join the team. The primary focus for SRE is responsible for knowing the availability of all environments (production and non prod). They leverage their deep knowledge of the services to determine appropriate signals and thresholds to best determine and maintain site health.
We are a highly professional and friendly team that enjoys working together in a collaborative environment. We have a close bond with the Engineering team to work together to create the next financial solution. We loathe technical debt, and toil, and seek to have tooling and systems in place that ease the burden for everyone. We want to maximize team knowledge and professional growth by having time to do research and find ways to continually improve our areas of responsibility.
Job Responsibilities
- Build, improve, and maintain a comprehensive multi-environment monitoring solution
- Leverage our monitoring infrastructure to obtain maximum observability into our environments
- Work with DevOps. Release Engineering and Engineering to improve reporting of the applications the business needs
- Test environment resiliency by leveraging Chaos engineering (cluster degradation, network latencies, pod /region failure), and application fuzz testing
- Responsible for load testing to know our capacity limits
- Comprehensive understanding of the impact of site availability and the availability of its dependent services and how that impacts.
- Create and maintain dashboards intended to provide relevant information to outside teams (Engineering, C-Staff, etc.), and internally within the Platform Operations team.
- Responsible for “right sizing” resources for all workloads.
- Work with the NOC to continuously improve auto remediation to reduce MTTR, and predictive analytics to reduce MTTD to zero.
- Provides cost reporting
- Maintain a authoritative inventory database
- Responsible for all alerting, and it's effectiveness
- Responsible for synthetic testing
- Work with Engineering to troubleshoot, and resolve application issues that reduce availability
- Work with Product and Engineering to determine KPIs, and formulate SLA and SLOs
Experience & Skills Requirements
- Experience with instrumentation, logging and alerting methods and best practices
- Proactive approach to problem solving
- Be an example to others through demonstrated professionalism, discipline, humility, and collaboration
- A passion for making data driven decisions
- 5+ years experience working in AWS
- 5+ years experience working as a SRE
Education
Bachelor's degree in computer science, engineering, or related field is required
Benefits - Flexible PTO/Paid holidays/401(k)/Health, Dental, and Vision insurance for employee and dependents which is currently 100% paid for by the company, after the first day of the month following date of employment, and connectivity service reimbursement up to $100 per month (which includes work related cell phone, wifi, etc.)
Apply Now:
This job is closed
Compensation: $105k - $120k estimated
Location: NV Las Vegas, Nevada, United States
This job is closed
Benefits: Pto, Vision Insurance, Dental Insurance, Medical Insurance
Receive similar jobs: