Get the latest updates on AI-powered hiring, career growth, and technical deep-dives delivered to your inbox.
Jobs via Dice
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Fidel Softech Ltd., is seeking the following. Apply via Dice today!
Hybrid role: Must be within 1 hour drivable distance: New York, NY/San Francisco, CA/ Philadelphia, PA/Boston, MA/ Richmond, VA/ St.Louis, MO/Minneapolis, MN/ Dallas, TX/Cleveland, OH/ Charlotte, NC/Kansas City, KS/Atlanta, GA
As a Senior Cloud Engineer in the Cloud SRE team, you will be responsible for designing and developing cloud solutions and engineering reliability tools for the Cloud Foundation Services (CFS) platform in the Infrastructure, Platforms, & Operations organization.
You will apply software engineering practices to build scalable, reusable solutions and utilities that enhance platform reliability across the Federal Reserve System.
Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices across the system
Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principles
Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of the solutions
Define SRE standards, best practices, and guidelines for adoption across teams; establish SRE metrics like SLI, SLOs, etc.
Apply software engineering best practices, including version control, code reviews, test-driven development, and documentation to all development
Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysis
Stay current with emerging AWS services, SRE methodologies, and cloud-native development technologies, and drive adoption of innovative solutions
Collaborate within Agile and Scaled Agile frameworks with cross-functional teams to deliver integrated cloud automation solutions
Produce clear, blameless postmortems with actionable items and documented failure scenarios
Seven years of experience in software development, with focus on reliability and platform engineering
Five years of Python development skills, with proven experience building enterprise-grade, highly available tools, APIs, and utilities
A minimum of three years of hands-on experience developing solutions in AWS environments, with deep understanding of core services (EC2, VPC, S3, Lambda, IAM, CloudFormation, EventBridge, Step Functions etc.) and resource cost optimization
Three years of experience applying SRE principles ? including observability, toil automation, SLIs/SLOs and reliability engineering
Expert-level proficiency with Infrastructure as Code (IaC) using Terraform, including module development and state management
Strong experience with CI/CD pipelines, automated testing frameworks, and DevOps practices
Experience with observability tools and practices, including Grafana, AWS CloudWatch, AWS Canary
Experience defining, implementing, and managing SLOs/SLIs and error budgets; familiarity with conducting RCAs and producing postmortem documentation
Working experience in Agile and Scaled Agile environments, and familiarity with ITSM processes (incident, change, and problem management), resilience testing, and chaos engineering practices
Experience with GoLang or additional programming languages is a plus
Verified Listing
This role has been verified for authenticity, market-rate compensation, and remote eligibility.