Get the latest updates on AI-powered hiring, career growth, and technical deep-dives delivered to your inbox.
CLPS Global
Duration: 1 year Contract
● Develop Test Strategy: Create a comprehensive test plan for the Lakehouse, focusing on Data Integrity, Accuracy, and Consistency.
● Automate Validation: Replace manual "spot-checking" with automated Python test suites that run as part of the CI/CD pipeline.
● Defect Analysis: Identify and document data anomalies, working closely with Data Engineers to perform root-cause analysis on Spark job failures.
● Regression Testing: Ensure that new PySpark code deployments do not impact existing Gold layer business logic or dashboard reporting.
Experience Requirements
● Total QA/Testing Experience: 5+ years.
● Data Testing Experience: 3+ years specifically in Big Data, Hadoop, or Cloud Data Warehouse environments.
● Databricks Experience: 1+ years of experience testing pipelines within a Databricks environment.
● Automation Focus: Proven track record of moving from manual SQL checks to automated Python-based testing frameworks.
● Mandatory: Databricks Certified Data Engineer Associate (at minimum).
● Preferred: ISTQB Foundation or Advanced Level (Test Automation Engineer).
Core Technical Skills
● Great Expectations / Pandera: Proficiency in using Python-based libraries to define data "contracts" and automated validation suites.
● DLT Expectations: Deep understanding of Delta Live Tables (DLT) expectations (Fail, Drop, Quarantining bad records).
● Advanced SQL: Expert-level SQL for complex data reconciliation, identifying duplicates, and null-value analysis across billions of records.
● Pytest-Spark: Experience using pytest to write unit tests for PySpark transformations and logic.
● Notebook Testing: Ability to write automated test notebooks that validate Medallion Architecture transitions (Bronze to Silver, Silver to Gold).
● Data Reconciliation: Building Python scripts to perform "source-to-target" counts and checksums across distributed file systems.
● Scalability Testing: Ability to validate that data pipelines meet performance SLAs when data volume spikes.
● End-to-End Orchestration Testing: Testing the reliability of Databricks Workflows and handling of job failures/retries.
● Schema Evolution: Testing how pipelines handle upstream schema changes without breaking downstream Gold tables.
● Unity Catalog Validation: Testing Row-Level Security (RLS) and Column-Level Masking to ensure unauthorized users cannot see sensitive data.
● Data Lineage: Validating that data lineage in Unity Catalog correctly reflects the movement of data across the Lakehouse.
Preferred Candidate Background
● "Data-First" Mindset: Understanding that testing a Lakehouse is about testing the data and its behavior, not just the "UI" or "API."
● Software Engineering Foundation: Candidates who know how to use Git (Branching/Merging) to manage their test code alongside the engineering team.
● Distributed Systems Knowledge: Basic understanding of Spark (shuffling, partitioning) to understand why data might be missing or duplicated in a distributed environment.
Salary up to 8200 SGD
About CLPS RiDiK
RiDiK is a global technology solutions provider and a subsidiary of CLPS Incorporation (NASDAQ: CLPS), delivering cutting-edge end-to-end services across banking, wealth management, and e-commerce. With deep expertise in AI, cloud, big data, and blockchain, we support clients across Asia, North America, and the Middle East in driving digital transformation and achieving sustainable growth.
Operating from regional hubs in 10 countries and backed by a global delivery network, we combine local insight with technical excellence to deliver real, measurable impact. Join RiDiK and be part of an innovative, fast-growing team shaping the future of technology across industries.
We will review applications on a rolling basis until 10 Jun 2026, and early submissions are encouraged. Please note that only shortlisted candidates will be contacted. Thank you for your understanding.
Verified Listing
This role has been verified for authenticity, market-rate compensation, and remote eligibility.