Back to jobsJob overview

About the role

Research Scientist Intern, Vision-Language and Embodied AI (PhD) at Meta

Required Skills

pythonpytorchreinforcement learningcomputer visionvision-language modelsembodied airesearchmultimodal learning

About the Role

This is a PhD-level research scientist internship focused on developing next-generation assistance systems using embodied AI, vision-language models, and world model learning. The intern will conduct cutting-edge research, implement algorithms, and collaborate to produce publishable results in top-tier venues. The role involves working with simulators, reinforcement learning, and multimodal methods to solve complex real-world interaction tasks.

Key Responsibilities

  • Plan and execute cutting-edge research on embodied AI algorithms, assistance policies, vision-language models, and world model learning
  • Develop, implement, and evaluate methods for improving performance and interpretability of VLMs and AI/ML models
  • Leverage state-of-the-art simulators, RL/DRL, neuro-symbolic, AI planning, robotics, and multimodal learning methods
  • Write modular, reusable research code and utilize Meta's large infrastructure to scale experimentation
  • Collaborate cross-functionally with researchers and engineers to prototype and test models at scale

Required Skills & Qualifications

Must Have:

  • Currently has or is in process of obtaining a PhD in Machine Learning, AI, Computer Vision, Robotics, or related field
  • Proven research skills: problem definition, solution exploration, analysis, and presentation of results
  • 2+ years experience in Python and machine learning libraries (Numpy, Scikit-Learn, Scipy, Pandas, Matplotlib, Tensorflow, Pytorch)
  • Understanding of at least one of: embodied AI, reinforcement learning, planning, vision-language models, LLM interpretability, world model learning, or pose estimation

Nice to Have:

  • Proven track record of significant results: grants, fellowships, patents, and first-authored publications at leading conferences
  • Experience with VLM/LLM training/fine-tuning and solving traditional CV problems (e.g., pose estimation, image classification)
  • Experience working and communicating cross-functionally in a team environment