Posted 2mo ago

Software Engineer (AI)

@ Gresham Technologies
Gurugram, Haryana, India
HybridFull Time
Responsibilities:Design pipelines, Build data platforms, Enable LLM-based workflows
Requirements Summary:4–9 years in Data Engineering / AI Data Engineering; strong Python (Pandas, NumPy); advanced SQL; Airflow, DBT, Spark; AWS data stack; knowledge of LLMs, embeddings, and RAG; experience with vector databases; CI/CD, Docker, IaC
Technical Tools Mentioned:Python, Pandas, NumPy, SQL, Airflow, DBT, Spark, AWS, S3, Glue, Lambda, EMR, Embedding, FAISS, Pinecone, LangChain, CrewAI, AutoGen, Iceberg, Delta Lake, Snowflake, Oracle, SQL Server, Docker, CI/CD, Infrastructure-as-Code
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Job description


We are seeking an AI Data Engineer to build scalable data platforms that power analytics, machine learning, and Generative AI (LLM/RAG) use cases. This role combines data engineering, cloud, and AI/ML capabilities to enable intelligent data pipelines, agentic workflows, and real-time data processing.

Job Responsibilities
  • Design and build scalable ETL/ELT pipelines using Python, SQL, Airflow, DBT, and Spark.
  • Develop data platforms on AWS (S3, Glue, EMR, Lambda, SQS, EventBridge).
  • Build and optimize RAG pipelines (embeddings, vector DBs like FAISS/Pinecone).
  • Enable LLM-based and agentic workflows (LangChain, CrewAI, AutoGen).
  • Implement event-driven and real-time data pipelines.
  • Design data lake/lakehouse architectures (Iceberg/Delta Lake).
  • Ensure data quality, lineage, and observability (OpenMetadata or similar).
  • Support ML pipelines, feature engineering, and model retraining workflows.
  • Implement CI/CD and containerized deployments (Docker).
  • Optimize and productionize existing data workflows.
Job Requirements
  • 4–9 years of experience in Data Engineering / AI Data Engineering
  • Strong Python (Pandas, NumPy) and advanced SQL
  • Hands-on with Airflow, DBT, Spark (EMR/Glue)
  • Experience with AWS data stack (S3, Glue, Lambda, EMR, etc.)
  • Understanding of LLMs, embeddings, and RAG architectures
  • Experience with vector databases (FAISS, Pinecone, etc.)
  • Knowledge of data lakes/lakehouse (Iceberg/Delta)
  • Experience with relational/analytical DBs (Snowflake, Oracle, SQL Server)
  • Familiarity with CI/CD, Docker, Infrastructure-as-Code, and DevOps practices and automation tools.
Preferred Skills
  • Experience with Trino/Presto
  • Exposure to OpenMetadata or data governance tools
  • AWS certifications
  • Experience in real-time/streaming pipelines
  • Exposure to product engineering environments
Equal Opportunities Statement
At Gresham, we are committed to building a diverse and inclusive workforce that reflects the communities we serve. We actively encourage applications from individuals of all backgrounds and are dedicated to providing a workplace where everyone feels valued, respected and supported.

We make employment decisions based on merit, skills and potential, and do not discriminate based on any protected characteristic. We are also committed to making reasonable adjustments throughout the recruitment process and employment lifecycle.