Confident AI is the leading LLM evaluation platform that helps teams evaluate, test, benchmark, optimize, monitor, and red-team LLM applications. Powered by DeepEval, the go-to LLM evaluation framework with over 600k monthly downloads, 5.3k GitHub stars, and over 40 million evaluations conducted, Confident AI is trusted by hundreds of companies from leading startups to international corporations. What is Confident AI? Confident AI an open-source company building 1) an open-source package called DeepEval to unit-test LLM applications such as chatbots, agents, and RAG pipelines, and 2) the cloud platform for DeepEval. It's like Next.JS and Vercel. The founding team is a small group of exceptional engineers and researchers from top colleges and companies such as Google, Microsoft, and Princeton. Our Values and Morals Things we value: No excuses or BS—if something is wrong, surface it so someone can help. Openness and transparency—hiding a problem won’t make it go away. No politics, micromanagement, or bureaucracy, even in controversial discussions. Autonomy, ownership, and responsibility—just as expected from any grown adult. No ghosting—respect others’ time and effort. Doers, not yappers, function over form. This means we're ok with remote work as long as you deliver. No excuses or BS—if something is wrong, surface it so someone can help. Openness and transparency—hiding a problem won’t make it go away. No politics, micromanagement, or bureaucracy, even in controversial discussions. Autonomy, ownership, and responsibility—just as expected from any grown adult. No ghosting—respect others’ time and effort. Doers, not yappers, function over form. This means we're ok with remote work as long as you deliver. What you'll be doing:
Working on DeepEval (most used package for LLM evaluation in the world) for both LLM evaluation features and also LLM red teaming features. Write high quality content around what you've built in the form of documentation and blog articles for the open-source community. Maintain and support integrations with other open-source projects Support our open-source community for any questions and help they might need.

You should be able to: Read papers, and have a natural curiosity for new research. Write clearly, and is an avid reader. Code proficiently and quickly in Python and Typescript. Work 6 days a week, we're not hiding we expect a lot from you.

Your work will: Be used by hundreds of thousands of open-source users, all the way from individual hobbyist to AI leaders at Fortune 500 companies. Educate hundreds of thousands of people, that wouldn't otherwise know how to quality assure their LLM applications. Be respected and appreciated by the community.

By joining us, you will: Be shaping the future of LLM testing and evaluation. Learn how to run and do startups, in a relatively safe environment. Work closely with the founders, with the possibly of promoted to an executive role in the future. Be compensated highly, with generous founding equity. This also means that we expect a lot from you. Confident AI is building an open-source LLM evaluation framework called DeepEval to help companies evaluate their LLM applications. While we provide the algorithms, companies are free to use their own LLMs for evaluation and our job is to make sure they get accurate evaluation results and a good user experience while using our framework. Confident AI's commercial product brings DeepEval to the cloud. While DeepEval is great, it can only do so much as a testing framework that runs locally in notebooks or CI/CD pipelines. With Confident AI, companies can get instant access to benchmark and LLM testing reports, catch regressions at scale, and monitor LLM applications in production. Our Hiring Process The entire process is usually remote and most communication happens over email or via video chat in Google Meet. We know that you may be interviewing elsewhere as well so am respectful of your time and will get back no later than 2 days of each step along the process.

The entire process has 4 steps and takes around 1.5 week in total: Initial 15-30 minute phone screening interview. One 30-45 minute technical interview. One week fully-paid work trial. Full-time offer. Initial 15-30 minute phone screening interview. One 30-45 minute technical interview. One week fully-paid work trial. Full-time offer. You'll be working with the founders directly throughout the entire process.

Founding Open-Source Growth Engineer

Required Skills

Job Description

Job Details

Employment Type

Salary Range

Location

Remote Work

Similar Jobs

Enterprise Account Executive

Frontend Engineer

Cyber Intelligence Instructor, Mandiant

Enterprise Account Executive

Battery Modelling Intern

Executive Assistant

Full Stack Developer

Machine Learning Engineer

Website Development & Redesign Specialist