Posted 1mo ago

Data Engineer

@ Truthset
Oakland or San Francisco or North America
$150k-$180k/yrHybridFull Time
Responsibilities:Design pipelines, Automate ingestion, Ingest data
Requirements Summary:3+ years of data engineering experience; Python/Scala/Java; cloud tools (Spark, AWS EMR, Snowflake/Databricks/Redshift); ETL pipelines; SQL and data modeling; strong communication.
Technical Tools Mentioned:AWS EMR, AWS S3, AWS EC2, AWS Athena, AWS Sagemaker, Spark, DBT, Snowflake, Databricks, Airflow, Terraform, Github, Tableau, Scala, Python, SQL, Bash, Redshift
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Senior Data Engineer looking to join a growing startup!

Description


Job Title: Data Engineer


Location: San Francisco Bay Area / Remote US


Who We Are:  Truthset is a venture-backed SaaS startup solving the multi-billion dollar problem of data quality for the entire marketing industry. Our platform enables brands and publishers such as Paramount, Procter & Gamble, and Transunion to optimize consumer data quality, improving marketing ROI. In a fast-paced and collaborative environment, we are committed to excellence and innovation.


Our Tech Stack:   AWS (EMR, EC2, S3, Athena, Sagemaker), Spark, DBT, Snowflake, Databricks, Airflow, Terraform, Github, Tableau. 


Our Programming Languages: Scala, Python, SQL, and Bash. 


Who You Are:

A driven individual excited about joining a small, but growing Data Science and Engineering team. You’ll report to the Head of Data Science and work alongside a Data Scientist and a Principal ML Engineer. You have a deep understanding of data engineering principles and past work experience designing, building, and maintaining data pipelines in cloud environments. 


Responsibilities:

  • Design, build,  and maintain scalable data pipelines that supply big data to internal and external teams.
  • Automate the delivery of terabytes of structured data to a growing group of enterprise clients.
  • Automate the ingestion of terabytes of external data sources into internal data warehouses in different environments (e.g., AWS, Snowflake, Databricks). 
  • Write, test, debug, and optimize custom Scala code for ETL workflows and other one-off tasks. 
  • Deploy ETL code in the cloud (using batch orchestration tools, like Airflow).
  • Work closely with the Head of Data Science and Principal ML Engineer to test and deploy new infrastructure for data processing.
  • Create an internal toolkit (KPIs, testing programs, dashboards) to monitor the health of data pipelines.
  • Maintain documentation about generated datasets (data dictionaries, feed specs. etc.) for internal and external use. 
  • Advise the Head of Data Science on future tooling upgrades



Core Qualifications:      

  • Bachelor's in Computer Science, Mathematics, Statistics, or other related fields.
  • 3+ years of relevant work experience. 
  • Proficiency in one or more programming languages such as Python, Scala, Java, or other languages commonly used in data engineering
  • Experience with cloud/distributed computing tools, including Spark, AWS EMR, and cloud-based data warehouse platforms such as Snowflake, Databricks or Redshift.
  • A strong background in at least one of the following: distributed data processing or software engineering of data services, or data modeling 
  • Experience with relational (SQL) databases and graph databases
  • Experience with version control software, such as Github.
  • Excellent communication and collaboration skills.
  • Strong problem-solving skills and attention to detail.



Ideal Qualifications:

  • Industry experience programming in Scala.
  • Familiarity with a scripting language like Python or R. 
  • Familiarity with Terraform and Airflow.
  • Familiarity with DBT



Compensation:

The compensation package will include full health benefits, 401k, and the potential for an equity stake.   


Contact: 

To apply, please email a CV and (optional) cover letter to [email protected]


About the Company


Truthset is a dynamic and innovative data intelligence company that specializes in validating the accuracy of the world's consumer data to empower data-driven decision-making and marketing success.