Back to jobsJob overview

About the role

Sr. Research Engineer, Machine Learning, AGI Foundations at Amazon.com Services LLC

Required Skills

machine learningllmspytorchcudadistributed trainingawssoftware architectureagi

About the Role

The Senior Research Engineer will lead the development of multimodal large-language foundational models (LLMs) and novel algorithms to advance state-of-the-art AGI. Responsibilities include scaling training on large GPU clusters, optimizing workflows, and building efficient models. This role involves hands-on machine learning, distributed systems, and influencing overall strategy at the intersection of engineering and applied science.

Key Responsibilities

  • Lead development of novel algorithms and modeling techniques for multimodal LLMs
  • Scale training of models on hyper large GPU and AWS Trainium clusters
  • Optimize training workflows using distributed training/parallelism techniques
  • Optimize low-level details like CUDA kernels, communication collectives, network I/O
  • Utilize and extend industry frameworks (NeMo, Megatron Core, PyTorch, Jax, vLLM, TRT)

Required Skills & Qualifications

Must Have:

  • 5+ years of professional software development experience
  • 5+ years of programming with at least one software language
  • 5+ years of leading design or architecture of systems
  • 2+ years of practical machine learning experience

Nice to Have:

  • 5+ years of full software development life cycle experience
  • Master's degree in machine learning or equivalent
  • Hands-on experience training Foundational Models/LLMs or low-level optimization

Benefits & Perks

  • Medical, financial, and other benefits
  • Equity and sign-on payments may be provided
  • Inclusive culture with workplace accommodations