Job Summary
We are seeking an experienced Python Data Engineer with strong expertise in building scalable data pipelines and processing large datasets using Python, PySpark, SQL, and Databricks. The ideal candidate should have hands-on experience with Azure cloud services, Azure Data Lake Storage, ETL development, and DevOps practices.
The role involves designing, developing, and optimizing data solutions that support business intelligence, analytics, and data-driven decision-making.
Key Responsibilities
- Design, develop, and maintain robust and scalable data pipelines using Python and PySpark.
- Build and optimize ETL/ELT processes for ingesting, transforming, and loading large volumes of structured and unstructured data.
- Develop data processing solutions using Python libraries such as Pandas, NumPy, and PySpark.
- Leverage Databricks to implement and manage data engineering workflows and solve complex business problems.
- Work with Azure Data Lake Storage (ADLS) and other Azure cloud services to manage and process data efficiently.
- Write complex SQL queries, stored procedures, and scripts for data extraction, transformation, validation, and reporting.
- Collaborate with cross-functional teams including Data Analysts, Data Scientists, Architects, and Business Stakeholders to understand data requirements.
- Monitor, troubleshoot, and optimize data pipelines for performance, scalability, and reliability.
- Implement data quality checks, governance standards, and security best practices.
- Participate in code reviews and ensure adherence to development standards and best practices.
- Support CI/CD implementation and deployment activities following DevOps methodologies.
Required Skills & Qualifications
- 5–8+ years of hands-on experience in Python development and data engineering.
- Strong programming expertise in Python.
- Experience with Python libraries such as Pandas, NumPy, and PySpark.
- Strong knowledge of SQL and database concepts.
- Hands-on experience with Databricks and Spark-based data processing.
- Experience with Azure Cloud services and Azure Data Lake Storage (ADLS).
- Solid understanding of ETL/ELT concepts and data warehousing principles.
- Experience working with large-scale datasets and distributed computing frameworks.
- Working knowledge of DevOps practices, CI/CD pipelines, Git, and deployment automation.
- Strong analytical, problem-solving, and debugging skills.
- Excellent communication and stakeholder management skills.
Preferred Skills
- Experience with Azure Data Factory (ADF), Azure Synapse Analytics, or other Azure data services.
- Knowledge of Delta Lake and Medallion Architecture.
- Exposure to data governance, data security, and cloud-native architectures.
- Experience in Agile/Scrum development environments.
Education
- Bachelor’s or Master’s degree in Computer Science, Information Technology, Engineering, or a related field.
Skills
pyspark,python,databricks