Back to jobsJob overview
About the role
Principal AI Network Architect at Microsoft
Required Skills
ai network architecturegpu systemsrdma protocolsoptical interconnectssignal integrityhigh-radix switchesai training workloadshyperscale deployments
About the Role
Microsoft seeks a Principal AI Network Architect to design ultra-high bandwidth, low-latency backend networks for next-generation GPU and AI accelerator platforms. The role involves driving system-level integration for scalable AI training workloads and collaborating with cross-functional teams and industry partners to shape hyperscale AI infrastructure.Key Responsibilities
- Spearhead architectural definition and innovation for next-generation GPU and AI accelerator platforms
- Drive system-level integration across compute, storage, and interconnect domains
- Partner with silicon, firmware, and datacenter engineering teams to co-design infrastructure
- Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers
- Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains
Required Skills & Qualifications
Must Have:
- Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR Master's Degree AND 7+ years OR equivalent experience
- 5+ years of experience in designing AI backend networks and integrating them into large-scale GPU systems
- Ability to meet Microsoft, customer and/or government security screening requirements including Microsoft Cloud Background Check
Nice to Have:
- Proven expertise in system architecture across compute, networking, and accelerator domains
- Deep understanding of RDMA protocols (RoCE, InfiniBand), congestion control (DCQCN), and Layer 2/3 routing
- Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration
- Familiarity with signal integrity modeling, link training, and physical layer optimization
- Experience architecting backend networks for AI training and Inference workloads, including Hamiltonian cycle traffic and collective operations
Benefits & Perks
- Industry leading healthcare