Posted 1mo ago

Lead Data Engineer

@ C the Signs
United States, United States, United States
RemoteFull Time
Responsibilities:Architecting data, Developing pipelines, Ensuring compliance
Requirements Summary:7+ years data engineering; 2–3+ years in lead or senior role; strong GCP experience; healthcare data standards; HIPAA/PHI/compliance knowledge.
Technical Tools Mentioned:BigQuery, GCP, Cloud Storage, Pub/Sub, Cloud Run/Functions, dbt, Airflow, Python, SQL, FHIR, HL7, DICOM, C-CDA, X12, Dataflow, Beam, AWS, Redshift, Lambda, S3, Glue, Kinesis, Athena, API Gateway, Step Functions
Save
Mark Applied
Hide Job
Report & Hide
Job Description

We are seeking a Lead Data Engineer to architect, build, and scale our next-generation healthcare data platform. In this role, you will lead the effort to design robust pipelines, modernize data architecture, and ensure high-quality ingestion and transformation of clinical and operational data. You’ll collaborate closely with product, analytics, clinical informatics, machine learning, and engineering teams to deliver trusted, timely, and compliant insights.

This is a hands-on leadership role ideal for someone who enjoys setting technical direction while still contributing code and guiding stakeholders through complex healthcare data challenges.

Responsibilities

Architecture & Strategy

  • Lead design and evolution of our cloud-native data platform built primarily on Google Cloud Platform, including BigQuery, Cloud Storage, Pub/Sub, Cloud Run, Airflow (Cloud Composer), and Healthcare API.
  • Inform strategic decisions around multi-cloud or AWS interoperability when needed.
  • Establish data engineering best practices, coding standards, and architectural patterns.

Pipeline Development

  • Build scalable ETL/ELT pipelines using dbt for transformations and Airflow for orchestration.
  • Develop ingestion pipelines for clinical and administrative data in HL7, FHIR, DICOM, and custom formats.
  • Develop ingestion and transformation pipelines to be used for AI/ML development and model training.
  • Implement streaming and batch dataflows using Pub/Sub, Dataflow, and serverless compute.
  • Support or guide integrations with AWS-based partner systems or AWS-hosted data sources when applicable.

Data Modeling & Warehousing

  • Design and maintain BigQuery datasets, semantic layers, and warehouse structures.
  • Leverage industry standards such as FHIR resources for canonical healthcare models.
  • Provide guidance on data modeling and warehouse best practices across both GCP and AWS ecosystems.

Data Quality, Observability & Governance

  • Implement data quality frameworks, automated testing, and monitoring.
  • Ensure HIPAA compliance and proper handling of PHI/PII across all pipelines and cloud environments.
  • Drive lineage, documentation, metadata governance, and dbt docs adoption.

Leadership & Collaboration

  • Partner with analytics, product, clinical informatics, and security teams to deliver high-quality, trustworthy data products.
  • Provide oversight and technical direction for multi-cloud data integrations with AWS-based systems or partners.
  • Assist in the recruitment and development of junior data engineers