KEY RESPONSIBILITES AND SKILL SET- Design and develop highly scalable, Real time systems using Hadoop ecosystem components(Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink and Nifi- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi model data(image, audio, video, unstructured documents) with both batch and real-time- Develop full stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React)- learning models using Cloudera Machine Learning (CML).
- Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
- Experience working with ML platforms such as CML, Spark MLlib, and Python ML libraries (scikit learn, XBoost), including model deployment.- Design and develop highly scalable, Real time systems using Hadoop ecosystem components Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink and Nifi)- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi model data(image, audio, video, unstructured documents) with both batch and real-time.- Develop full stack applications and internal engineering tools usingPython, shell scripting, and modern web frameworks (e.g., Flask, React).
- Collaborate closely with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML).- Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
KEY SKILLS
- Experience with Python, Java, Scala, or C++.ML Frameworks & Libraries
- XGBoost, Scikit learn, Tensor Flow/keras, Hugging face (NLP/NLQ/Gen Al use cases)
- Full-Stack Development
- Performance Optimization
- Data Engineering & Ingestion Frameworks
- Collaboration with Data Science Teams