Hi everyone!!!
Greeting from #PwC
šļø PwC India is actively looking for talented candidates those who will be at Mumbai location and also willing to relocate candidates!! š
Education: B. Tech, M. Tech, B.E, MBA
Interested candidates please fill below form
https://forms.cloud.microsoft/r/RUNtUFcs1D
Please Send your resume to kirthana.xx.tpr@pwc.com
Please mention subject line as Job Application- Skillset
Big Data Technical
We are looking for a suitable candidate for the opening of the big data Engineer role with good to have experience in the Banking and Financial Services domain with 4-12 years of relevant experience in Data Engineering / Big Data platforms. The candidate will work closely with Business, Risk, and Technology stakeholders to deliver scalable data products on Cloudera (CDH/CDP).
Key Roles & Responsibilities
Must have / Primary Skills / Mandatory
- Hands-on experience building data pipelines on Hadoop/Spark ecosystem
- Strong Spark (Scala and/or PySpark), Hive/Impala SQL, and performance tuning
- Working knowledge of Kafka for streaming ingestion and NiFi (or StreamSets) for batch/near-real-time flows
- Experience with Cloudera Manager, YARN/Tez, HDFS, and job orchestration using Oozie/Airflow
- Good understanding of data warehousing concepts (dimensional modelling, partitioning, bucketing)
- Proficiency in Linux/Unix, Shell scripting, Git, and CI/CD (Jenkins/GitLab CI)
- Strong SQL and data modelling for BFSI use cases (lending, liabilities, risk, regulatory reporting)
- Experience in writing technical design documents (HLD/LLD) and unit/integration testing
- Exposure to SDLC/Agile and working in onsiteāoffshore model
- Location - Mumbai
- Develop robust Spark jobs (batch and streaming) with unit tests and observability
- Implement ingestion patterns (Kafka/NiFi), data quality checks, and job scheduling
- Analyze and tune SQL/Spark for large-scale datasets
Good to have / Secondary Skills / Desired
- Experience with Cloudera Data Platform (CDP) Private Cloud Base/Public Cloud
- Security and governance: Apache Ranger, Atlas; Kerberos; Sentry (legacy)
- Cloud data services ā AWS (EMR, Glue, S3), Azure (HDInsight, Synapse, ADLS), or GCP (Dataproc, BigQuery)
- Databricks experience (Spark, Delta Lake) for select workloads
- Containerization and orchestration (Docker/Kubernetes) for micro-batch/ML workloads
- Python for data processing and utilities; familiarity with Scala build tools (sbt/maven)
- Monitoring/observability ā Cloudera Manager metrics, Grafana/Prometheus, log aggregation
- Experience with BI consumption patterns and semantic layers for risk/regulatory dashboards
Educational Qualifications
B.E./B.Tech or equivalent
Experience Range
4-12 years
Certifications (Preferred)
- Cloudera Certified Associate/Professional (CCA/CCP) or CDP certifications
- AWS/Azure/GCP data certifications (nice to have)
- Databricks Lakehouse Fundamentals or Associate (nice to have)