Back to jobsJob overview
About the role
Principal Software Engineer at Microsoft
Required Skills
pythonc++ai/mlllmsgpu optimizationpytorchcudaperformance debuggingsoftware engineering
About the Role
Principal Software Engineer role at Microsoft's AI Performance team, focusing on optimizing AI inference performance for large language models across various hardware. Responsibilities include performance benchmarking, debugging, and tooling development to enable efficient deployment of state-of-the-art LLMs. The role involves hands-on software design and collaboration with internal and external partners.Key Responsibilities
- Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
- Measure and benchmark performance on Nvidia/AMD GPUs and Microsoft silicon
- Optimize and monitor performance of LLMs and build SW tooling for performance insights
- Enable fast time-to-market of LLMs by building tools for porting models on new hardware
- Design, implement, and test functions or components for AI/DNN/LLM frameworks and tools
Required Skills & Qualifications
Must Have:
- Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- 4+ years of experience working on high performance applications and performance debug and optimization on CPUs/GPUs
- Ability to meet Microsoft security screening requirements, including Microsoft Cloud Background Check
- Hands-on technical role requiring software design and development skills
Nice to Have:
- Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience OR Bachelor's Degree AND 15+ years experience
- Technical background in software engineering principles, computer architecture, GPU architecture, HW neural net acceleration
- Experience in end-to-end performance analysis and optimization of state-of-the-art LLMs, HPC applications, and proficiency using GPU profiling tools
- Experience in DNN/LLM inference and one or more DL frameworks such as PyTorch, TensorFlow, or ONNX Runtime, with familiarity with CUDA, ROCm, Triton
Benefits & Perks
- Industry leading healthcare