Posted 3mo ago

Lead/ Sr. Pyspark Data Engineer

@ Dataeconomy
Hyderabad, Telangana, India
OnsiteFull Time
Responsibilities:Design ETL, Orchestrate workflows, Collaborate teams
Requirements Summary:6+ years in data engineering; strong PySpark, AWS Glue, Airflow, SQL; experience with large data pipelines.
Technical Tools Mentioned:PySpark, AWS Glue, Apache Airflow, SQL, S3, Athena, Lambda
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Job Title: PySpark Data Engineer
Experience: 6+ Years
Location: Hyderabad/ Pune
Employment Type: Full-Time

 

Job Summary:

We are looking for a skilled and experienced PySpark Data Engineer to join our growing data engineering team. The ideal candidate will have 6+ years of experience in designing and implementing data pipelines using PySpark, AWS Glue, and Apache Airflow, with strong proficiency in SQL. You will be responsible for building scalable data processing solutions, optimizing data workflows, and collaborating with cross-functional teams to deliver high-quality data assets.



Requirements

Key Responsibilities:

  • Design, develop, and maintain large-scale ETL pipelines using PySpark and AWS Glue.
  • Orchestrate and schedule data workflows using Apache Airflow.
  • Optimize data processing jobs for performance and cost-efficiency.
  • Work with large datasets from various sources, ensuring data quality and consistency.
  • Collaborate with Data Scientists, Analysts, and other Engineers to understand data requirements and deliver solutions.
  • Write efficient, reusable, and well-documented code following best practices.
  • Monitor data pipeline health and performance; resolve data-related issues proactively.
  • Participate in code reviews, architecture discussions, and performance tuning.



Requirements


  • 6+ years of experience in data engineering roles.
  • Strong expertise in PySpark for distributed data processing.
  • Hands-on experience with AWS Glue and other AWS data services (S3, Athena, Lambda, etc.).
  • Experience with Apache Airflow for workflow orchestration.
  • Strong proficiency in SQL for data extraction, transformation, and analysis.
  • Familiarity with data modeling concepts and data lake/data warehouse architectures.
  • Experience with version control systems (e.g., Git) and CI/CD processes.
  • Ability to write clean, scalable, and production-grade code.



Benefits

As per company standards.