We are looking for a talented Data Engineer with strong PySpark experience to build, optimize, and maintain scalable data. Pipelines and data platforms. The ideal candidate will have expertise in data engineering concepts, distributed data processing, ETL/ELT development, and big data technologies.
The role involves working with large datasets, transforming raw data into actionable insights, and ensuring high data quality and reliability
Requirements
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark.
- Build and optimize ETL/ELT processes for batch and real-time data ingestion.
- Process and transform large-scale structured and unstructured datasets.
- Develop data solutions that support analytics, reporting, and machine learning initiatives.
- Optimize Spark jobs for performance, scalability, and resource utilization.
- Work with data architects and business stakeholders to understand data requirements.
- Implement data quality checks, validation rules, and monitoring processes.
- Collaborate with cross-functional teams to integrate data from multiple source systems.
- Troubleshoot and resolve data pipeline failures and performance bottlenecks.
- Ensure adherence to data governance, security, and compliance standards.
- Participate in code reviews and promote engineering best practices.
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
- Must Have
Skills
- 5+ years of experience in Data Engineer.
- Strong hands-on experience with Python and PySpark.
- Excellent SQL development and query optimization skills.
- Experience designing and implementing ETL/ELT pipelines.
- Strong understanding of distributed computing and big data concepts.
- Experience working with data warehouses and data lakes.
- Familiarity with Git-based version control systems.