Happyrobot builds AI agents to automate phone calls in the logistics industry.

From simple check calls with truck drivers, to contract price negotiations between enterprises, our AI agents are able to provide and gather information more efficiently than other alternatives.

We believe that voice will become a much more prevalent interface for digital systems, and we're building the tools to make that possible.

**About HappyRobot

**HappyRobot is a platform to build and deploy AI workers that automate communication. [See a demo

](https://www.loom.com/share/031bd86fc6ca4084b11fa8d745e493ed?sid=aa1c06a2-2cb7-4c9a-86d9-acd0f42927ec)OurOur) AI workers connect to any system or data source to handle phone calls, email, messages…

We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.

We raised a Series A round from a16z and YC and we’re growing very fast.

We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.

**About The Role

**We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.

This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.

Must-Have

1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)
Strong problem-solving skills and ability to dive into unfamiliar backend codebases
Comfort with Python and Go for reading code and writing small tools/utilities
Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)
Clear, calm communication under pressure — especially during live incidents Nice-to-Have
Experience working with distributed systems or services at scale
Built or maintained internal tooling for on-call teams or reliability workflows
Familiarity with deployment pipelines, CI/CD, or infra-as-code
Experience improving system observability (e.g., custom metrics, traces, log pipelines) Why join us?
Opportunity to work at a high-growth AI startup, backed by top investors.
Fast Growth - Backed by a16z and YC, on track for double-digit ARR.
Top-Tier Compensation - Competitive salary + equity in a high-growth startup.
Ownership & Autonomy - Take full ownership of projects and ship fast.
Work With the Best - Join a world-class team of engineers and builders. JavaScript, TypeScript, Node.js, Kubernetes, Rust, Python, PyTorch, LLM training & finetuning.

Site Reliability Engineer at HappyRobot

Required Skills

Job Description

Job Details

Employment Type

Salary Range

Location

Similar Jobs

Founding Product Engineer

Senior Account Executive

Embedded Systems Engineer

Sales Development Representative

GTM Engineer

Senior Backend Engineer

Lead ECU Development Engineer

Full-stack engineer

Senior Frontend Engineer