Posted 3mo ago

Senior Data Engineer: Data Lake (Remote)

@ Constructor
Spain
$80k-$120k/yrRemoteFull Time
Responsibilities:Maintain pipelines, Develop quality, Provide support
Requirements Summary:4+ years building production data pipelines and services; fluent English; strong Python; experience with at least one MPP system; cloud (AWS preferred).
Technical Tools Mentioned:Python, Spark/Databricks, ClickHouse, AWS Lambda, Kinesis, FastAPI, Prometheus, OpenTelemetry, Sentry, Terraform, CloudFormation
Save
Mark Applied
Hide Job
Report & Hide
Job Description

About Us

Constructor is the next-generation platform for search and discovery in ecommerce, built to explicitly optimize for metrics like revenue, conversion rate, and profit. Our search engine is entirely invented in-house utilizing transformers and generative LLMs, and we use its core and personalization capabilities to power everything from search itself to recommendations to shopping agents. Engineering is by far our largest department, and we’ve built our proprietary engine to be the best on the market, having never lost an A/B test to a competitive technology. We’re passionate about maintaining this and work on the bleeding edge of AI to do so.

Out of necessity, our engine is built for extreme scale and powers over 1 billion queries every day across 150 languages and roughly 100 countries. It is used by some of the biggest ecommerce companies in the world like Sephora, Under Armour, and Petco.

We’re a passionate team who love solving problems and want to make our customers’ and coworkers’ lives better. We value empathy, openness, curiosity, continuous improvement, and are excited by metrics that matter. We believe that empowering everyone in a company to do what they do best can lead to great things.

Constructor is a U.S. based company that has been in the market since 2019. It was founded by Eli Finkelshteyn and Dan McCormick who still lead the company today.

Job Description

The Constructor Data Platform is a foundational component for all internal data and ML teams. It handles the ingestion of over 2 TB of compressed events daily and manages over 6 PB of data in our data lake. 

The Data Platform:

  • Is a comprehensive set of tools and infrastructure used daily by every data scientist and ML engineer in our company.
  • Implements public-facing APIs for event ingestion (FastAPI) and real-time analytics (ClickHouse, Cube).
  • Manages data storage in appropriate formats (S3, ClickHouse, Delta).
  • Facilitates data processing using technologies such as Python, Spark/Databricks, ClickHouse, AWS Lambda, and Kinesis.
  • Includes robust monitoring solutions (Prometheus, OpenTelemetry, PagerDuty, Sentry).
  • Ensures automated testing of pipelines and data quality.
  • Provides cost observability and optimization capabilities.
  • Offers comprehensive tools for developers to develop, run, test, and schedule data pipelines, along with all necessary support and documentation.

Our platform is developed by the Data Lake Team and the Data Infrastructure Team.

About the Data Lake Team

We're hiring a Senior Data Engineer to work on our Data Lake Team. Here is what we doing day to day:

  • Maintain data pipeline job framework 
  • Develop Data Quality framework ( internal set of tools for internal and external data sources validation )
  • Maintain and develop public facing data ingestion service with 17 000+ RPS.
  • Maintain and develop core data pipelines in batch and streaming manners.
  • Be a last line of support for our internal platform users.
  • Take a part in an on-call rotation for data platform incidents (shared across the team).