Back to jobsJob overview
About the role
Principal AI Architect at Microsoft
Required Skills
ai/mlgpupytorchcudatritonhardware/software co-designperformance engineeringllmdistributed systems
About the Role
Principal AI Architect role at Microsoft's Azure Hardware Systems and Infrastructure team, focusing on AI accelerator and GPU architecture optimization. Responsibilities include hardware/software co-design, performance tuning of large-scale AI workloads, and cross-functional collaboration across hardware, software, and ML model teams.Key Responsibilities
- Lead bring-up and functional validation of LLMs on custom AI accelerators and GPUs
- Develop and maintain detailed performance characterizations across compute, memory, and interconnect domains
- Partner with silicon and system architects, compiler/runtime engineers, and model researchers for co-design strategies
- Analyze kernel- and system-level traces to identify performance limiting factors
- Collaborate with teams across Azure ML, DeepSpeed, and Maia hardware programs
Required Skills & Qualifications
Must Have:
- Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering or related field AND 9+ years technical engineering experience OR Bachelor's Degree AND 11+ years experience
- 10+ years experience in AI systems, hardware/software co-design, or performance engineering
- 5+ years experience in AI accelerator and GPU architectures including compute pipelines, memory hierarchies, and interconnects
- 5+ years experience with PyTorch, CUDA, Triton or similar frameworks for performance tuning and kernel development
Nice to Have:
- Experience with compiler and runtime frameworks (e.g., MLIR, TVM, XLA, or custom code generation flows)
- Familiarity with DeepSpeed, Megatron-LM, SGLang, or vLLM training and inference pipelines
- Deep understanding of transformer-based model architectures and scaling behaviors
Benefits & Perks
- Industry leading healthcare