About the Team
The Post-Training - Core Algorithms team is responsible for researching and developing the next generation of algorithms to power our RLHF stack (reinforcement learning from human feedback). The algorithms we develop are used in ChatGPT consumer product and the OpenAI API.
About the Role
As a Member of Technical Staff on our team, you will research and develop improvements to all components of our RLHF stack, including data collection, supervised finetuning, reward modeling, off- and on-policy learning, active learning, and evaluations. The ultimate test for our algorithms is how useful they are to our users, and we often deploy our algorithms into new ChatGPT models.
We’re looking for people who have extensive background in reinforcement learning research, are able to iterate quickly, and are proficient at coding.
This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
In this role, you will:
- Come up with improvements to RLHF
- Prototype and evaluate these ideas
- Scale up your innovations to ChatGPT scale
You might thrive in this role if you:
- Love being on the cutting edge of RL and language model research
- Can iterate fast on lots of ideas
- Like doing research that has real-world impact