
Senior Site Reliability Engineer
iDen2 -
San Francisco, United StatesJob Details
Experience Needed:
Career Level:
Education Level:
Salary:
Job Categories:
Skills And Tools:
Job Description
We’re seeking a Senior Site Reliability Engineer (SRE) to join iDen2 Inc.’s mission to transform digital identity through secure, scalable, and innovative solutions. You’ll design, build, and maintain highly available, resilient systems to support our Self-Sovereign Identity (SSI) and Decentralized Identity (DID) platforms, ensuring seamless performance for global users across industries like banking, healthcare, and education.
Job Requirements
- Architect and implement robust infrastructure to ensure 99.99% uptime for our identity platform.
- Develop automation tools and scripts (e.g., Python, Go, or Bash) to streamline deployment, monitoring, and incident response.
- Optimize cloud-based systems (AWS, Azure, or GCP) for scalability, security, and cost-efficiency.
- Lead incident management, perform root cause analysis, and drive continuous improvement through postmortems.
- Collaborate with cross-functional teams to integrate SRE best practices into CI/CD pipelines and microservices architectures.
- Mentor junior engineers and contribute to a culture of operational excellence.
- Design, build, and maintain a secure, scalable, and highly available cloud infrastructure on AWS using Kubernetes and Terraform.
- Champion and implement security best practices across all layers of the infrastructure, including network, compute, storage, and CI/CD pipelines.
- Develop and enforce Infrastructure as Code (IaC) principles to ensure consistent, repeatable, and auditable environments.
- Proactively identify and articulate infrastructure requirements, potential risks, and design considerations by asking critical and insightful questions.
- Collaborate with development and operations teams to streamline deployment processes and ensure seamless integration.
- Implement comprehensive monitoring, logging, and alerting solutions to ensure optimal performance and rapid issue resolution.
- Drive continuous improvement of infrastructure security, reliability, and cost-effectiveness.