OVERVIEW

We are looking for a highly technical ML Systems Engineer to architect and build scalable AI inference capabilities across heterogeneous environments.

This role focuses on solving real-world challenges in AI model execution, runtime interoperability and performance optimization.

You will operate at the intersection of machine learning, systems engineering, and software engineering, building platforms and tooling that standardise and simplify AI models serving in production environments.

Key Responsibilities

Inference Systems Engineering

Design and develop abstractions, middleware, and system components to support model inference across Traditional and Generative AI
Build integration layers across different model formats, execution engines, and deployment environments
Ensure consistency, portability, reliability, and scalability of model execution

Model Handling

Support diverse model architectures, including:
Large Language Models (LLMs)
Computer vision models
NLP models
Multi-modal models
Optimise models for latency, throughput and resource efficiency
Optimise model loading strategies
Implement robust mechanisms for model lifecycle management.

Benchmarking & Evaluation

Develop and execute benchmarking methodologies to evaluate:
Latency vs throughput trade-offs
Runtime and hardware performance characteristics
Use case performance characteristics
Support data-driven deployment decisions through profiling and performance analysis

Platform Integration & Developer Experience

Develop APIs, libraries, and platform services that enable:
Simplified model deployment and serving
Runtime backends selection
Model Observability
Model Scaling
Improve developer and platform operators’ experience while preserving operational flexibility and low-level control

Technical Experience

Must-Have

Hands-on experience with at least one inference stack:
Traditional AI
NVIDIA Triton Inference Server
Generative AI
vLLM
SGLang
Dynamo/LLM-D
Strong ability to profile, diagnose, and optimise performance bottlenecks
Strong proficiency in at least one programming language (e.g. Python, C++, Go, Rust)
Good understanding of Linux systems, distributed systems concept, and system-level debugging
Familiarity with containers and orchestration platforms such as Docker, and Kubernetes/OpenShift

Preferred Experience

Experience working in air-gapped or restricted environments with enterprise GPU (e.g. A100, H200, B200)
Experience with LLM inference, including the understanding of terms such as KV cache management, Prefill vs decode phases, continuous batching and token-level scheduling.
Experience with model optimisation including the understanding of terms such as quantisation (FP16, INT8, INT4), graph optimisation and compilation.

JOB REQUIREMENTS

Degree in Computer Science, Computer Engineering, or a related discipline.
Minimum 2–3 years of relevant experience in ML systems, inference engineering, platform engineering, or performance-critical software systems.

Experience

2 ~ 5 years

Job Type

Full-Time

Qualification

Bachelor's degree or equivalent

Working Hours

Standard Hours

Programme Centre / Entity

Digital Hub

OVERVIEW

We are looking for a highly technical ML Systems Engineer to architect and build scalable AI inference capabilities across heterogeneous environments.

This role focuses on solving real-world challenges in AI model execution, runtime interoperability and performance optimization.

Key Responsibilities

Inference Systems Engineering

Design and develop abstractions, middleware, and system components to support model inference across Traditional and Generative AI
Build integration layers across different model formats, execution engines, and deployment environments
Ensure consistency, portability, reliability, and scalability of model execution

Model Handling

Support diverse model architectures, including:
Large Language Models (LLMs)
Computer vision models
NLP models
Multi-modal models
Optimise models for latency, throughput and resource efficiency
Optimise model loading strategies
Implement robust mechanisms for model lifecycle management.

Benchmarking & Evaluation

Develop and execute benchmarking methodologies to evaluate:
Latency vs throughput trade-offs
Runtime and hardware performance characteristics
Use case performance characteristics
Support data-driven deployment decisions through profiling and performance analysis

Platform Integration & Developer Experience

Develop APIs, libraries, and platform services that enable:
Simplified model deployment and serving
Runtime backends selection
Model Observability
Model Scaling
Improve developer and platform operators’ experience while preserving operational flexibility and low-level control

Technical Experience

Must-Have

Hands-on experience with at least one inference stack:
Traditional AI
NVIDIA Triton Inference Server
Generative AI
vLLM
SGLang
Dynamo/LLM-D
Strong ability to profile, diagnose, and optimise performance bottlenecks
Strong proficiency in at least one programming language (e.g. Python, C++, Go, Rust)
Good understanding of Linux systems, distributed systems concept, and system-level debugging
Familiarity with containers and orchestration platforms such as Docker, and Kubernetes/OpenShift

Preferred Experience

Experience working in air-gapped or restricted environments with enterprise GPU (e.g. A100, H200, B200)
Experience with LLM inference, including the understanding of terms such as KV cache management, Prefill vs decode phases, continuous batching and token-level scheduling.
Experience with model optimisation including the understanding of terms such as quantisation (FP16, INT8, INT4), graph optimisation and compilation.

JOB REQUIREMENTS

Degree in Computer Science, Computer Engineering, or a related discipline.
Minimum 2–3 years of relevant experience in ML systems, inference engineering, platform engineering, or performance-critical software systems.

Experience

2 ~ 5 years

Job Type

Full-Time

Qualification

Bachelor's degree or equivalent

Working Hours

Standard Hours

Programme Centre / Entity

Digital Hub

ML Systems Senior Engineer

Job Description

OVERVIEW

Key Responsibilities

Preferred Experience

JOB REQUIREMENTS

Experience

Apply now

Stay at the forefront
of market

ML Systems Senior Engineer

Job Description

OVERVIEW

Key Responsibilities

Preferred Experience

JOB REQUIREMENTS

Experience

Apply now

ML Systems Senior Engineer

Job Description

OVERVIEW

Key Responsibilities

Preferred Experience

JOB REQUIREMENTS

Experience

Apply now

Stay at the forefront of market

ML Systems Senior Engineer

Job Description

OVERVIEW

Key Responsibilities

Preferred Experience

JOB REQUIREMENTS

Experience

Apply now

Stay at the forefront
of market