About the Role
We are seeking a highly skilled and experienced Data Lake Cloud Engineer with a proven track record of designing, implementing, and maintaining large-scale cloud-based data lake platforms. This role requires a professional who can take ownership of our current data lake ecosystem, optimize its performance, and drive future enhancements with minimal oversight. The ideal candidate will have at least 5 years of hands-on experience in building enterprise-grade data lakes, strong cloud architecture expertise, and the ability to work with cutting-edge data ingestion, processing, and analytics tools.
Key Responsibilities
- Take ownership of the existing enterprise data lake platform, ensuring scalability, reliability, and performance.
- Lead the design, architecture, and implementation of cloud-native data lake solutions and integrations.
- Manage and optimize data ingestion pipelines on Oracle OCI, using tools such as Apache NiFi, Kafka, Batch Processing of data, Data captures, and or CSV.
- Design and implement pipelines for network data ingestion and file formats (e.g., Parquet, Avro, OCR, etc.), ensuring efficient storage, processing, and retrieval.
- Build, configure, and tune query engines such as Trino (Presto), Spark, and Hive for efficient analytics and reporting.
- Implement and maintain metadata management, data governance, and security frameworks.
- Monitor and troubleshoot system performance, ensuring SLAs are met for ingestion, processing, and query workloads.
- Automate platform deployment, monitoring, and maintenance with Infrastructure-as-Code (Terraform, CloudFormation, etc.).
- Collaborate with data engineers, analysts, and business teams to understand data requirements and deliver solutions that maximize data accessibility and usability.
- Keep the data platform up to date with the latest open-source and cloud-agnostic technologies, implementing upgrades and enhancements where needed.