Talent Job Seeker

Machine Learning Infrastructure Engineer, Model Inference

About the position

Job Description: Why candidates should join We transform healthcare with AI that turns patient-clinician conversations into structured clinical notes in real-time. They're the only company mapping AI summaries to ground truth with 250+ customers and millions of conversations processed monthly. You'll join a mission-driven team of MDs and AI scientists building the "ChatGPT for doctors" with exciting agent technology on the horizon. They're backed by strong funding and offer competitive packages for the right candidates. This ML Infrastructure Engineer role focuses on model inference at scale - you'll build Kubernetes clusters and optimize GPU utilization for their core AI models. Hybrid work in SF (3 days/week) with direct impact on healthcare delivery. We are looking for a Machine Learning Infrastructure Engineer with strong experience in building and deploying machine learning models in production environments to join our team. You will play a pivotal role in building and optimizing the core inference infrastructure that powers our machine learning models. Your work will be instrumental in enhancing the scalability, efficiency, and performance of our AI-driven solutions, working closely with our Infrastructure and Research teams to build, deploy, optimize and orchestrate across our AI models. What will you be doing? Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training. Develop, optimize, and maintain ML model serving infrastructure, ensuring high-performance and low-latency. Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughput optimization, and compute efficiency. Optimize compute-heavy workflows and enhance GPU utilization for ML workloads. Build a robust model API orchestration system. Tech stack NVIDIA Triton Server, VLLM, TRT-LLM, PyTorch, Tensorflow, CUDA, Terraform, Ansible, Python Seniority 5 - 10 years of experience in software engineering, with at least one year in model inference and ML infrastructure Work experience Experience working at a high growth big tech company (e.g., Meta) or a well-known AI company (e.g., Anthropic, Perplexity, Hippocratic AI) focused on model inference developing and optimizing ML model serving infrastructure for performance. Strong experience in building and deploying ML models in production environments. Experience orchestrating ASR or LLM models for GenAI applications. Experience developing APIs and managing distributed systems for both batch and real-time workloads Experience building, deploying, and maintaining scalable Kubernetes clusters for AI/ML. Education Holds a BS, MS, or PhD in Computer Science or a related field. Hard skills Proficient in Python and PyTorch/TensorFlow Deep understanding of distributed systems. Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT-LLM, etc. Familiarity with GPU cluster management and CUDA optimization. Knowledge of Infrastructure as Code (Terraform, Ansible) and GitOps practices Soft skills Excellent communication skills to interface between research and product. Miscellaneous Must work in 3 days/week from the downtown San Francisco office . Traits to avoid Candidates with a purely research or academic background. Jumpy job history with no role lasting more than two years. Experience limited to traditional software infrastructure without ML focus. Experience is primarily in general backend or data infrastructure, not ML/AI infrastructure. Seniority 5 - 10 years of experience in software engineering, with at least one year in model inference and ML infrastructure Work experience Experience working at a high growth big tech company (e.g., Meta) or a well-known AI company (e.g., Anthropic, Perplexity, Hippocratic AI) focused on model inference developing and optimizing ML model serving infrastructure for performance. Strong experience in building and deploying ML models in production environments. Experience orchestrating ASR or LLM models for GenAI applications. Experience developing APIs and managing distributed systems for both batch and real-time workloads Experience building, deploying, and maintaining scalable Kubernetes clusters for AI/ML. Education Holds a BS, MS, or PhD in Computer Science or a related field. Hard skills Proficient in Python and PyTorch/TensorFlow Deep understanding of distributed systems. Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT-LLM, etc. Familiarity with GPU cluster management and CUDA optimization. Knowledge of Infrastructure as Code (Terraform, Ansible) and GitOps practices Soft skills Excellent communication skills to interface between research and product. Miscellaneous Must work in 3 days/week from the downtown San Francisco office . Traits to avoid Candidates with a purely research or academic background. Jumpy job history with no role lasting more than two years. Experience limited to traditional software infrastructure without ML focus. Experience is primarily in general backend or data infrastructure, not ML/AI infrastructure. Seniority 5 - 10 years of experience in software engineering, with at least one year in model inference and ML infrastructure Work experience Experience working at a high growth big tech company (e.g., Meta) or a well-known AI company (e.g., Anthropic, Perplexity, Hippocratic AI) focused on model inference developing and optimizing ML model serving infrastructure for performance. Strong experience in building and deploying ML models in production environments. Experience orchestrating ASR or LLM models for GenAI applications. Experience developing APIs and managing distributed systems for both batch and real-time workloads Experience building, deploying, and maintaining scalable Kubernetes clusters for AI/ML. Education Holds a BS, MS, or PhD in Computer Science or a related field. Hard skills Proficient in Python and PyTorch/TensorFlow Deep understanding of distributed systems. Expertise with model serving frameworks such as NVIDIA Triton Server, VLLM, TRT-LLM, etc. Familiarity with GPU cluster management and CUDA optimization. Knowledge of Infrastructure as Code (Terraform, Ansible) and GitOps practices Soft skills Excellent communication skills to interface between research and product. Miscellaneous Must work in 3 days/week from the downtown San Francisco office . Traits to avoid Candidates with a purely research or academic background. Jumpy job history with no role lasting more than two years. Experience limited to traditional software infrastructure without ML focus. Experience is primarily in general backend or data infrastructure, not ML/AI infrastructure.

Place of work

Talent Job Seeker
California
app.general.countries.United States

About the company

Identifica el mejor Talento con Talent Job Seeker



Job ID: 10539612 / Ref: a542cd42dc690cc08d61da41a9eb1c99

Open application open_in_new

Talent Job Seeker