Back to jobsJob overview

About the role

Software Engineer at Meta

Required Skills

c/c++pytorchcudaopenclai acceleratorsmodel optimizationparallel computingperformance profiling

About the Role

Meta is seeking a Software Engineer to maximize AI model performance on GPUs or custom silicon, focusing on optimizing training and inference for Generative AI and recommendation models. The role involves applying state-of-the-art optimization techniques and working cross-functionally to enhance large-scale AI workloads on Meta's accelerators.

Key Responsibilities

  • Work cross-functionally to co-design models for pre-training and inference efficiency
  • Apply and drive state-of-the-art optimization techniques to large-scale AI workloads on Meta's accelerators
  • Profile, analyze, debug, and optimize large-scale workloads on training superclusters
  • Optimize the vertical stack from kernels, framework, communication, and firmware to layers and hyperparameters
  • Set direction and goals for the team related to project impact, capacity, and developer efficiency

Required Skills & Qualifications

Must Have:

  • Bachelor's degree in computer science or a related STEM field
  • Experience programming AI accelerators (e.g., GPUs, custom silicon) using AI frameworks such as PyTorch or similar
  • Experience developing custom kernels and compiler infrastructure using low-level programming models like CUDA or OpenCL
  • Minimum 6+ years of experience developing and optimizing performance in modern C/C++

Nice to Have:

  • Experience with training and validating large-scale AI models, including parallelizing across accelerators
  • Understanding of multiprocessing, including race conditions and communications between processes
  • Experience evaluating model performance with profilers and tuning hyperparameters
  • Thorough understanding of model and data parallelisms such as FSDP, tensor parallelism, etc.
  • Demonstrated experience of the model life cycle from pre-training to inference for large language models