Back to jobsJob overview

About the role

Principal Platform System Engineer at Microsoft

Required Skills

hardware designhigh-speed interfacesaccelerator systemsai/ml infrastructuredatacenter deploymentsystem validationpower and thermalpciefpga

About the Role

Principal Platform System Engineer at Microsoft's Azure Cloud Hardware and Infrastructure Engineering team, focusing on designing and deploying accelerator-based hardware systems for AI/ML infrastructure. Responsibilities include collaborating across disciplines to develop system requirements, validation, and deployment processes for high-speed interfaces and datacenter solutions.

Key Responsibilities

  • Collaborate with architecture, silicon engineering, firmware, hardware design, and customer teams to build accelerator hardware solutions
  • Participate in architectural discussions, design and evaluate system behavior, power, thermal and cooling solutions for AI/ML workloads
  • Analyze new interfaces and subsystems, develop integration plans, debug issues, and provide recommendations
  • Define system behavior and concept of operations for platform compatibility with Azure datacenter software and customer expectations
  • Perform NUDD technology analysis, provide risk assessment and mitigations, and drive technical requirements across HW/FW/SW stack

Required Skills & Qualifications

Must Have:

  • Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in related field AND 8+ years technical engineering experience OR equivalent experience
  • 8+ years of relevant experience in accelerator based system design and/or implementation across product development lifecycle
  • 5+ years of hands on experience in developing high speed interfaces for accelerator based HW systems or compute based HW systems
  • Ability to pass Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Nice to Have:

  • Experience developing, integrating, deploying, and managing GPU- or FPGA-based accelerator platforms for AI/ML use cases throughout product lifecycle
  • Experience designing and developing system behavior across hardware, firmware, management firmware, and software, collaborating across disciplines to assess system tradeoffs
  • Knowledge of high-volume silicon (SoCs, GPUs, or FPGAs), compute, storage, and/or networking design, manufacturing, and deployment, including expertise in high-speed interfaces (PCIe, DDR, ethernet), power characterization, and thermal/cooling systems
  • Knowledge of datacenter operations at scale
  • Demonstrated ability to communicate technical concepts clearly in verbal and written formats

Benefits & Perks

  • Industry leading healthcare