Create Digital Solutions
Hybrid Remote, 1 day per week in Victoria London
Role Overview
We are looking for a senior SRE / Dev
Ops practitioner to design, standardise, and operate cloud platforms that support multiple AI-driven products and services.
This role focuses on building opinionated, reusable infrastructure patterns that enable teams to rapidly deliver AI workloads while maintaining high standards for reliability, security, and cost control.
You will develop platform architecture across multiple concurrent projects, ensuring consistency in how services are deployed, integrated, and operated. This includes shaping how AI/ML workloads are built, deployed, and monitored, as well as defining clear patterns for service communication, API exposure, and infrastructure provisioning.
This is a hands-on role for someone who is comfortable making strong architectural decisions, reducing variability across teams, and balancing flexibility with standardisation in a fast-moving environment.
Key Responsibilities
Platform Architecture & Standardisation
Define and implement opinionated architecture patterns for cloud-native and AI-enabled services on AWS
Establish reusable blueprints for these same services
Drive consistency across multiple projects through shared modules, templates, and platform tooling
Infrastructure as Code & Automation
Build and maintain Terraform-based infrastructure, using modular and reusable design
Define CI/CD patterns for:
Infrastructure deployment
Application and model delivery
Enforce best practices through pipelines and automation rather than documentation
Reliability, Observability & Operations
Embed SRE principles across all services:
Monitoring, logging, tracing
SLIs/SLOs and alerting
Continuously improve reliability, performance, and cost efficiency
Operate API gateway/data plane technologies (e.g. Kong)
Required Skills & Experience
Strong experience operating AWS-based platforms in production
Proven experience with Terraform, including module design and CI/CD integration
Hands-on experience with container platforms (ECS preferred; EKS acceptable if adaptable)
Experience operating API gateways (Kong or equivalent)
Solid understanding of cloud networking and service discovery patterns
Experience supporting multiple teams or projects on a shared platform
Strong troubleshooting and production operations experience
AI / Data Platform Experience (Required)
Practical experience running or supporting AI/ML workloads in production, such as:
Model inference services
Batch processing pipelines
Integration with LLM APIs or hosted models
Understanding of:
Scaling characteristics of AI workloads
Cost considerations (compute-heavy workloads, GPU usage, etc.)
Familiarity with tooling such as:
Model serving frameworks
Data processing pipelines
Or managed AI services on AWS
Desirable Skills
Experience with GPU workloads or specialised compute environments
Familiarity with feature stores, vector databases, or embedding pipelines
Knowledge of event-driven architectures
Experience with security best practices (IAM, secrets management, Zero Trust)
Exposure to platform engineering or internal developer platforms
Wider Skills
Ability to make and defend clear architectural decisions
Comfortable operating across multiple concurrent workstreams
Strong communication and stakeholder management skills
Detail-oriented with a bias toward automation and standardisation over ad hoc solutions
Pay: £60,000.00-£80,000.00 per year
Work Location: Hybrid remote in London W2 2UH
Verified Listing
This role has been verified for authenticity, market-rate compensation, and remote eligibility.
Get the latest updates on AI-powered hiring, career growth, and technical deep-dives delivered to your inbox.