Recruit Myself
Back to jobs
I
Verified

IBM

Site Reliability Engineering (SRE) Manager

San Diego, CAFull-timeMidCompetitiveMay 7, 2026
Share

Job Description

Introduction

We are seeking a Site Reliability Engineer (SRE) Manager who is excited to make significant contributions to our IBM CISO Platform team. The Site Reliability Engineering (SRE) Manager will lead a high‑performing SRE team. We value innovative thinkers, strong leaders, and people passionate about continuous improvement in a dynamic security environment. You will drive team execution, partner with cross‑functional teams and ensure that IBM’s internal security platforms maintain world‑class performance, resilience and compliance.

Your Role And Responsibilities

As a Software Developer: Generalist, you will design, develop, test, and deliver offerings using leading‑edge and/or proven technologies. You will work in an Agile, collaborative environment to understand stakeholder requirements and contribute to the development of innovative software solutions.

Your Primary Responsibilities Will Include

  • Develop Component‑Level Solutions: Design, code, and test innovative component‑level software solutions, ensuring that the implemented solutions are unit tested and ready to be integrated into their product.

  • Contribute to CI/CD Pipeline: Contribute to the automated CI/CD pipeline that takes code through various quality stages, ensuring seamless integration and delivery.

  • Debug Customer-Reported Problems: Design, develop, and unit test code fixes for customer‑reported problems, collaborating with stakeholders to resolve issues efficiently.

  • Deliver Offerings: Deliver high‑quality offerings using leading‑edge and/or proven technologies, meeting stakeholder requirements and expectations.

  • Collaborate in Agile Environment: Work collaboratively in an Agile environment to understand stakeholder requirements, aligning solutions with business needs and goals.

Required Technical And Professional Expertise

  • Proven experience managing or leading engineering, SRE, Dev

Ops, or operations teams.

  • Oversee implementation and automation of operational processes, infrastructure, monitoring, incident response and runbooks.

  • Own end‑to‑end service reliability, including SLI/SLOs, capacity planning, performance optimization and operational health.

  • Ensure platforms meet IBM CISO and enterprise security standards, regulatory requirements and risk policies.

  • Communicate strategy, risks, operational status and metrics to leadership and stakeholders.

  • Influence technology roadmaps and operational readiness for new internal solutions.

  • Strong background in delivering reliable, highly available services.

  • Deep understanding of security, compliance, and risk management frameworks.

  • Demonstrated success driving automation of infrastructure, monitoring, and operational tasks.

  • Lead, develop, and mentor a team of Site Reliability Engineers; provide coaching, career development, and performance management.

  • Foster a high‑performing engineering culture centered around accountability, innovation, and continuous improvement.

  • Align team objectives with the strategic direction of the IBM CISO organization and broader Enterprise & Technology Services.

  • Plan staffing, manage workload distribution, and ensure on‑call readiness and 24/7 service support coverage.

  • Excellent written and verbal communication skills with ability to influence and drive alignment across teams.

  • Ability to balance support of current systems while leading modernization and future‑state design.

  • Experience with Release/Change Management processes.

  • Ability to handle critical issues outside of business hours.

Preferred Technical And Professional Experience

  • Experience with Kubernetes, Open

Shift, or similar container orchestration platforms.

  • Experience building or operating Cloud‑native environments (AWS, Azure, GCP, IBM Cloud), Hybrid Cloud and on‑prem infrastructure environments.

  • Familiarity with observability tools.

  • Understanding of networking fundamentals and modern networking architectures.

  • Knowledge of Infrastructure as Code (Terraform, Ansible, etc.).

  • Exposure to Agile methodologies (Jira, Kanban, Scrum, etc.).

  • Working knowledge or scripting/programming languages (e.g., Python, etc.).

  • Professional Cloud and/or Security certifications (AWS, CISSP, etc.).

Verified Listing

This role has been verified for authenticity, market-rate compensation, and remote eligibility.

Apply now

Step 1 of 1
Newsletter

Stay at the forefront
of market

Get the latest updates on AI-powered hiring, career growth, and technical deep-dives delivered to your inbox.

No spam. Just pure intelligence.

Site Reliability Engineering (SRE) Manager at IBM | Recruit Myself