Back to jobsJob overview

About the role

Software Engineer- AI/ML, AWS Neuron at Annapurna Labs (U.S.) Inc.

Required Skills

pythonpytorchtensorflowdistributed trainingmachine learningawslarge language modelsperformance tuning

About the Role

This role is for a Software Engineer in the Machine Learning Applications team for AWS Neuron, responsible for developing, enabling, and performance tuning of various ML model families including large language models and stable diffusion. The engineer will build distributed training and inference support into frameworks like PyTorch and TensorFlow, and optimize models for AWS Trainium and Inferentia silicon.

Key Responsibilities

  • Develop, enable, and performance tune a wide variety of ML model families including large language models and stable diffusion
  • Build distributed training and inference support into PyTorch, TensorFlow, and Jax using XLA and Neuron compiler/runtime stacks
  • Tune models to ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon
  • Work with chip architects, compiler engineers, and runtime engineers to create distributed training solutions
  • Extend distributed training libraries like FSDP and Deepspeed for Neuron-based systems

Required Skills & Qualifications

Must Have:

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture experience for new and existing systems
  • Experience programming with at least one software programming language
  • Experience training large ML models using Python

Nice to Have:

  • 3+ years of full software development life cycle experience
  • Bachelor's degree in computer science or equivalent

Benefits & Perks

  • Inclusive team culture with employee-led affinity groups
  • Work-life balance with flexible working hours
  • Mentorship and career growth opportunities
  • Comprehensive compensation package including medical and financial benefits