Back to jobsJob overview
About the role
Principal Platform System Engineer at Microsoft
Required Skills
hardware designhigh-speed interfacesaccelerator systemsai/ml infrastructuredatacenter deploymentsystem validationpower and thermalpciefpga
About the Role
Principal Platform System Engineer at Microsoft's Azure Cloud Hardware and Infrastructure Engineering team, focusing on designing and deploying accelerator-based hardware systems for AI/ML infrastructure. Responsibilities include collaborating across disciplines to develop system requirements, validation, and deployment processes for high-speed interfaces and datacenter solutions.Key Responsibilities
- Collaborate with architecture, silicon engineering, firmware, hardware design, and customer teams to build accelerator hardware solutions
- Participate in architectural discussions, design and evaluate system behavior, power, thermal and cooling solutions for AI/ML workloads
- Analyze new interfaces and subsystems, develop integration plans, debug issues, and provide recommendations
- Define system behavior and concept of operations for platform compatibility with Azure datacenter software and customer expectations
- Perform NUDD technology analysis, provide risk assessment and mitigations, and drive technical requirements across HW/FW/SW stack
Required Skills & Qualifications
Must Have:
- Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in related field AND 8+ years technical engineering experience OR equivalent experience
- 8+ years of relevant experience in accelerator based system design and/or implementation across product development lifecycle
- 5+ years of hands on experience in developing high speed interfaces for accelerator based HW systems or compute based HW systems
- Ability to pass Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Nice to Have:
- Experience developing, integrating, deploying, and managing GPU- or FPGA-based accelerator platforms for AI/ML use cases throughout product lifecycle
- Experience designing and developing system behavior across hardware, firmware, management firmware, and software, collaborating across disciplines to assess system tradeoffs
- Knowledge of high-volume silicon (SoCs, GPUs, or FPGAs), compute, storage, and/or networking design, manufacturing, and deployment, including expertise in high-speed interfaces (PCIe, DDR, ethernet), power characterization, and thermal/cooling systems
- Knowledge of datacenter operations at scale
- Demonstrated ability to communicate technical concepts clearly in verbal and written formats
Benefits & Perks
- Industry leading healthcare