Back to jobsJob overview
About the role
Software Engineer at Meta
Required Skills
c/c++pytorchcudaopenclai acceleratorsmodel optimizationparallel computingperformance profiling
About the Role
Meta is seeking a Software Engineer to maximize AI model performance on GPUs or custom silicon, focusing on optimizing training and inference for Generative AI and recommendation models. The role involves applying state-of-the-art optimization techniques and working cross-functionally to enhance large-scale AI workloads on Meta's accelerators.Key Responsibilities
- Work cross-functionally to co-design models for pre-training and inference efficiency
- Apply and drive state-of-the-art optimization techniques to large-scale AI workloads on Meta's accelerators
- Profile, analyze, debug, and optimize large-scale workloads on training superclusters
- Optimize the vertical stack from kernels, framework, communication, and firmware to layers and hyperparameters
- Set direction and goals for the team related to project impact, capacity, and developer efficiency
Required Skills & Qualifications
Must Have:
- Bachelor's degree in computer science or a related STEM field
- Experience programming AI accelerators (e.g., GPUs, custom silicon) using AI frameworks such as PyTorch or similar
- Experience developing custom kernels and compiler infrastructure using low-level programming models like CUDA or OpenCL
- Minimum 6+ years of experience developing and optimizing performance in modern C/C++
Nice to Have:
- Experience with training and validating large-scale AI models, including parallelizing across accelerators
- Understanding of multiprocessing, including race conditions and communications between processes
- Experience evaluating model performance with profilers and tuning hyperparameters
- Thorough understanding of model and data parallelisms such as FSDP, tensor parallelism, etc.
- Demonstrated experience of the model life cycle from pre-training to inference for large language models