Name: AI Career Space
Availability: InStock
Rating: 4.8 (1250 reviews)

About the Role

Principal Platform System Engineer at Microsoft's Azure Cloud Hardware and Infrastructure Engineering team, focusing on designing and deploying accelerator-based hardware systems for AI/ML infrastructure. Responsibilities include collaborating across disciplines to develop system requirements, validation, and deployment processes for high-speed interfaces and datacenter solutions.

Key Responsibilities

Collaborate with architecture, silicon engineering, firmware, hardware design, and customer teams to build accelerator hardware solutions
Participate in architectural discussions, design and evaluate system behavior, power, thermal and cooling solutions for AI/ML workloads
Analyze new interfaces and subsystems, develop integration plans, debug issues, and provide recommendations
Define system behavior and concept of operations for platform compatibility with Azure datacenter software and customer expectations
Perform NUDD technology analysis, provide risk assessment and mitigations, and drive technical requirements across HW/FW/SW stack

Required Skills & Qualifications

Must Have:

Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in related field AND 8+ years technical engineering experience OR equivalent experience
8+ years of relevant experience in accelerator based system design and/or implementation across product development lifecycle
5+ years of hands on experience in developing high speed interfaces for accelerator based HW systems or compute based HW systems
Ability to pass Microsoft Cloud Background Check upon hire/transfer and every two years thereafter

Nice to Have:

Experience developing, integrating, deploying, and managing GPU- or FPGA-based accelerator platforms for AI/ML use cases throughout product lifecycle
Experience designing and developing system behavior across hardware, firmware, management firmware, and software, collaborating across disciplines to assess system tradeoffs
Knowledge of high-volume silicon (SoCs, GPUs, or FPGAs), compute, storage, and/or networking design, manufacturing, and deployment, including expertise in high-speed interfaces (PCIe, DDR, ethernet), power characterization, and thermal/cooling systems
Knowledge of datacenter operations at scale
Demonstrated ability to communicate technical concepts clearly in verbal and written formats

Benefits & Perks

Industry leading healthcare