Posted 3w ago

Senior AI Systems Engineer

@ Applied Research Associates
Raleigh or Albuquerque
RemoteFull Time
Responsibilities:leading deployment, designing infrastructure, operationalizing models
Requirements Summary:8-10 years engineering experience; 2+ years AI/ML support; Python/Bash scripting; PyTorch/Hugging Face; DevOps/MLOps; containerization and Kubernetes.
Technical Tools Mentioned:Python, Bash, PyTorch, Hugging Face, Kubernetes, Docker, Git, CI/CD, MLflow, Kubeflow, LangChain, OpenAI
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Essential Functions:

  • Lead the deployment, integration, and operational support of AI platforms, tools, and services, ensuring compatibility with existing systems and enterprise processes.
  • Design, implement, monitor, and optimize AI infrastructure, working with server, cloud, and platform engineering teams.
  • Operationalize machine learning workflows and support AI-enabled applications from development through production deployment and sustainment.
  • Build and maintain CI/CD and MLOps pipelines for model packaging, testing, deployment, rollback, and lifecycle management.
  • Implement infrastructure automation using scripting, Infrastructure as Code, and configuration management practices.
  • Provide ongoing technical support, troubleshooting, root cause analysis, and documentation for AI platforms and user-facing AI services.
  • Maintain observability across AI systems through logging, metrics, performance monitoring, alerting, and incident response practices.
  • Ensure security, compliance, and governance requirements are met, including participation in audits, vulnerability management, and secure architecture reviews.
  • Assess and implement system enhancements to improve performance, scalability, reliability, and cost efficiency.
  • Collaborate across divisions to support diverse AI initiatives and align technical implementations with mission and business objectives.
  • Evaluate emerging AI tools, frameworks, and infrastructure approaches for operational fit, supportability, and long-term value.
  • Develop and maintain technical documentation, runbooks, architecture diagrams, and operational procedures.

Experience and Skills Required:

  • Bachelor’s degree in computer science, Engineering, Information Technology, or a related STEM field with 8-10 years of engineering experience. 
  • 2+ years of experience supporting AI/ML platforms, MLOps workflows, model deployment, or AI-enabled infrastructure.
  • Strong coding and automation skills in Python, Bash, or similar scripting languages.
  • Experience with AI/ML frameworks and tooling such as PyTorch, Hugging Face, or similar ecosystems.
  • Proficiency with DevOps and MLOps practices, including CI/CD pipelines, Git-based workflows, containerization, and Kubernetes.
  • Experience deploying AI/ML models or AI services into operational environments, including containerized, cloud, or high-performance computing environments.
  • Familiarity with security frameworks and compliance standards such as NIST and CMMC.
  • Familiarity with AI security functionality in enterprise environments including OAuth
  • Strong communication skills and the ability to collaborate effectively across technical and non-technical teams.

Preferred:

  • Advanced degree or certifications related to AI or machine learning.
  • Experience integrating AI models into scientific workflows.
  • Familiarity with large language model (LLM) APIs and orchestration frameworks such as OpenAI, Hugging Face, LangGraph, or LangChain.
  • Experience with model serving, inference optimization, or AI platform tools such as MLflow, Kubeflow, vLLM, or similar.
  • Experience with simulations for scientific or engineering projects, particularly physical systems simulations.
  • Experience with GPU-based systems or running AI models in HPC environments.
  • Experience writing and deploying MCP Servers on Kubernetes
  • DoD experience
  • Secret Security Clearance – Active or Inactive

Education:

  • Bachelor’s degree in CS, Software Engineering or other IT-related field or equivalent experience

REMOTE WORK NOTICE: This position may be performed fully remote, hybrid, or onsite at an ARA office. Preference will be given to candidates located onsite in the Albuquerque, NM and Raleigh, NC area.