Back to Jobs

Site Reliability Engineer

HappyRobot
San Francisco, CA, US
Contract
$125K–$157K
Estimated
Apply Now

Required Skills

Llm
Python
R
Java
Javascript
Typescript
Go
Rust
Pytorch
React
Node.js
Kubernetes
Git
Communication

Job Description

Happyrobot builds AI agents to automate phone calls in the logistics industry. From simple check calls with truck drivers, to contract price negotiations between enterprises, our AI agents are able to provide and gather information more efficiently than other alternatives. We believe that voice will become a much more prevalent interface for digital systems, and we're building the tools to make that possible. **About HappyRobot

**HappyRobot is a platform to build and deploy AI workers that automate communication. [See a demo

](https://www.loom.com/share/031bd86fc6ca4084b11fa8d745e493ed?sid=aa1c06a2-2cb7-4c9a-86d9-acd0f42927ec)Our AI workers connect to any system or data source to handle phone calls, email, messages…

We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.

We raised a Series A round from a16z and YC and we’re growing very fast.

We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.

**About The Role

**We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.

This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.

Must-Have 1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.) Strong problem-solving skills and ability to dive into unfamiliar backend codebases Comfort with Python and Go for reading code and writing small tools/utilities Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry) Clear, calm communication under pressure — especially during live incidents 1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.) Strong problem-solving skills and ability to dive into unfamiliar backend codebases Comfort with Python and Go for reading code and writing small tools/utilities Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry) Clear, calm communication under pressure — especially during live incidents Nice-to-Have Experience working with distributed systems or services at scale Built or maintained internal tooling for on-call teams or reliability workflows Familiarity with deployment pipelines, CI/CD, or infra-as-code Experience improving system observability (e.g., custom metrics, traces, log pipelines) Experience working with distributed systems or services at scale Built or maintained internal tooling for on-call teams or reliability workflows Familiarity with deployment pipelines, CI/CD, or infra-as-code Experience improving system observability (e.g., custom metrics, traces, log pipelines) Why join us? Opportunity to work at a high-growth AI startup, backed by top investors. Fast Growth - Backed by a16z and YC, on track for double-digit ARR. Top-Tier Compensation - Competitive salary + equity in a high-growth startup. Ownership & Autonomy - Take full ownership of projects and ship fast. Work With the Best - Join a world-class team of engineers and builders. Opportunity to work at a high-growth AI startup, backed by top investors. Fast Growth - Backed by a16z and YC, on track for double-digit ARR. Top-Tier Compensation - Competitive salary + equity in a high-growth startup. Ownership & Autonomy - Take full ownership of projects and ship fast. Work With the Best - Join a world-class team of engineers and builders. JavaScript, TypeScript, Node.js, Kubernetes, Rust, Python, PyTorch, LLM training & finetuning.

Job Details

Employment Type

Contract

Salary Range

$125K–$157K

Estimated

Location

San Francisco, CA, US