Posted 1w ago

Senior Machine Learning Engineer

@ C the Signs
United States
RemoteFull Time
Responsibilities:Data preprocessing, Model training, Deployment pipeline
Requirements Summary:Bachelor's or Master's in CS/ML/AI; 5+ years ML engineering; large-scale data preprocessing, LLM training/fine-tuning; distributed training (PyTorch Distributed, DeepSpeed, Ray, Hugging Face Accelerate); GPU/TPU optimization; healthcare data experience; Python and ML libraries; cloud platforms; MLOps; strong communication and collaboration; US work authorization.
Technical Tools Mentioned:Python, TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy, GCP, AWS, Hugging Face, DeepSpeed, Ray, Spark, MLOps, CUDA
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Position Summary

The Machine Learning Engineer will be responsible for the end-to-end development and deployment of Large language and machine learning models, with a primary focus on data preprocessing, model training, and fine-tuning using large-scale healthcare datasets. This role requires a strong understanding of Large language models, machine learning principles, data engineering, and experience working with sensitive healthcare data.

Key Responsibilities

  • Data Preprocessing: Clean, transform, and prepare large, complex healthcare datasets for machine learning model development. This includes handling missing values, outlier detection, feature engineering, and data normalization. Identify, collect, and curate relevant, industry-specific datasets for model retraining. Format data appropriately for the chosen LLM and training pipeline
  • Model Training & Fine-Tuning: Design, train, and fine-tune various LLMs on extensive healthcare data to solve specific clinical or operational problems. Set up and manage the training environment, including GPU instances and required software. Train and fine-tune pre-trained LLMs on the custom dataset to achieve specific goals. Experiment with and fine-tune hyperparameters such as learning rate, batch size, and training epochs to optimize model performance. Integration of structured + unstructured data (multi-modal/multi-input models)
  • Model Evaluation & Optimization: Evaluate model performance using appropriate metrics, identify areas for improvement, and implement optimization strategies.
  • Pipeline Development: Develop and maintain robust and scalable data and ML pipelines for model training, inference, and deployment.
  • Collaboration: Work closely with data scientists, clinicians, and software engineers to understand requirements, integrate models into production systems, and ensure data privacy and security compliance.
  • Research & Development: Stay up-to-date with the latest advancements in machine learning and healthcare AI, and explore new technologies and methodologies to enhance our solutions.
  • Documentation: Maintain clear and comprehensive documentation of models, data pipelines, and experimental results.