Name: AI Career Space
Availability: InStock
Rating: 4.8 (1250 reviews)

About the Role

This role is for a Software Engineer in the Machine Learning Applications team for AWS Neuron, responsible for developing, enabling, and performance tuning of various ML model families including large language models and stable diffusion. The engineer will build distributed training and inference support into frameworks like PyTorch and TensorFlow, and optimize models for AWS Trainium and Inferentia silicon.

Key Responsibilities

Develop, enable, and performance tune a wide variety of ML model families including large language models and stable diffusion
Build distributed training and inference support into PyTorch, TensorFlow, and Jax using XLA and Neuron compiler/runtime stacks
Tune models to ensure highest performance and maximize efficiency on AWS Trainium and Inferentia silicon
Work with chip architects, compiler engineers, and runtime engineers to create distributed training solutions
Extend distributed training libraries like FSDP and Deepspeed for Neuron-based systems

Required Skills & Qualifications

Must Have:

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture experience for new and existing systems
Experience programming with at least one software programming language
Experience training large ML models using Python

Nice to Have:

3+ years of full software development life cycle experience
Bachelor's degree in computer science or equivalent

Benefits & Perks

Inclusive team culture with employee-led affinity groups
Work-life balance with flexible working hours
Mentorship and career growth opportunities
Comprehensive compensation package including medical and financial benefits

Software Engineer- AI/ML, AWS Neuron at Annapurna Labs (U.S.) Inc.