DeepAware AI (YC S25) builds AI and robotics solutions to make data centers more secure, efficient, and autonomous. Our platform combines AI-powered infrastructure management with autonomous robotics to protect critical AI systems, prevent downtime, and cut energy waste by up to 30%. We work with data center operators, AI infrastructure providers, and industry leaders to deliver hyperscaler-grade automation to everyone. As second-time founders with experience building industrial energy optimization systems for Siemens and Sumitomo, and convening global experts from Meta AI, Oracle, and the U.S. Senate, we’re tackling one of the fastest-growing and most impactful challenges of the AI era. If you’re excited about solving real-world problems at the intersection of AI, energy, and robotics — and building technology that keeps the world’s AI running securely and sustainably — you’ll fit right in at DeepAware. About the Role DeepAware AI (YC S25) is building secure, efficient, and autonomous infrastructure for the AI era. As an AI/ML Engineer, you’ll design, build, and deploy machine learning models that power our next-generation Data Center Infrastructure Management (DCIM) platform. Your work will focus on reinforcement learning for intelligent workload scheduling, optimization algorithms for energy and cost savings, and anomaly detection to prevent downtime and enhance security. You’ll be joining a fast-moving, technically ambitious team tackling some of the most complex real-world AI problems at the intersection of computing, energy, and robotics. Responsibilities Develop and refine reinforcement learning models for GPU workload placement and power optimization Implement anomaly detection pipelines for real-time threat detection and failure alerts Collaborate with data engineers to ensure high-quality, production-ready datasets Benchmark models against industry baselines and integrate them into our production systems Contribute to overall architecture and deployment strategies for large-scale AI infrastructure Requirements Strong background in machine learning; hands-on experience with reinforcement learning techniques Proficiency in Python and PyTorch or TensorFlow Experience with distributed training and deployment in production environments Familiarity with energy systems, scheduling algorithms, or operations research is a plus Ability to thrive in a startup environment — ownership mindset, adaptability, and collaborative spirit Nice-to-Have Experience with NVIDIA CUDA/cuDNN, Triton Inference Server, or ROS2 for robotics integration Knowledge of data center operations or AI infrastructure optimization Location: San Francisco Bay Area preferred; remote considered for exceptional candidates Why DeepAware? You’ll be working on problems that directly impact the sustainability and reliability of the world’s AI infrastructure — with a team that values technical excellence, creativity, and impact. We’re building a Data Center Infrastructure Management (DCIM) platform that uses Reinforcement Learning for workload scheduling, real-time market integration for energy cost optimization, and AI-driven monitoring for security and stability. Our upcoming robotics integration enables remote inspections, cable swaps, and maintenance — enabling true 24/7 autonomous operations. You’ll work on hard, high-impact technical problems — from designing scalable real-time control systems to optimizing GPU workloads across multi-tenant environments, to building the “robotic hands” of AI infrastructure.
Full-time
$125K–$157K
San Francisco, CA, US, Remote
Other opportunities you might be interested in