CV Builder Interview Copilot

Consultant-Pyspark

  • IT
  • Bangalore
  • 3 months ago
  • Wage Agreement

About the job

Summary

 

Position Summary

 

AI & Data

 

In this age of disruption, organizations need to navigate the future with confidence, embracing decision making with clear, data-driven choices that deliver enterprise value in a dynamic business environment.

 

TIn this age of disruption, organizations need to navigate the future with confidence, embracing decision making with clear, data-driven choices that deliver enterprise value in a dynamic business environment.

 

The AI & Data team leverages the power of data, analytics, robotics, science and cognitive technologies to uncover hidden relationships from vast troves of data, generate insights, and inform decision-making. The offering portfolio helps clients transform their business by architecting organizational intelligence programs and differentiated strategies to win in their chosen markets.

 

AI & Data will work with our clients to:

 

Implement large-scale data ecosystems including data management, governance and the integration of structured and unstructured data to generate insights leveraging cloud-based platforms

Leverage automation, cognitive and science-based techniques to manage data, predict scenarios and prescribe actions

Drive operational efficiency by maintaining their data ecosystems, sourcing analytics expertise and providing As-a-Service offerings for continuous insights and improvements

 

PySpark Consultant

 

The position is suited for individuals who have demonstrated ability to work effectively in a fast paced, high volume, deadline driven environment.

 

Education and Experience

 

Education:

 

B.Tech/M.Tech/MCA/MS

 

3-6 years of experience in design and implementation of migrating an Enterprise legacy system to Big Data Ecosystem for Data Warehousing project.

 

Required Skills

 

Must have excellent knowledge in Apache Spark and Python programming experience

Deep technical understanding of distributed computing and broader awareness of different Spark version

Strong UNIX operating system concepts and shell scripting knowledge

Hands-on experience using Spark & Python

Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.

Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred

Working experience on AWS ecosystem, Google Cloud, BigQuery etc. is an added advantage

Hands on experience with AWS S3 Filesystem operations

Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform

Should have exposure with Jenkins or equivalent CICD tool & Git repository

Experience handling CDC operations for huge volume of data

Should understand and have operating experience with Agile delivery model

Should have experience in Spark related performance tuning

Should be well versed with understanding of design documents like HLD, TDD etc

Should be well versed with Data historical load and overall Framework concepts

Should have participated in different kinds of testing like Unit Testing, System Testing, User Acceptance Testing, etc

 

Preferred Skills

 

Exposure to PySpark, Cloudera/ Hortonworks, Hadoop and Hive.

Exposure to AWS S3/EC2 and Apache Airflow

Participation in client interactions/meetings is desirable.

Participation in code-tuning is desirable.