Rewards are one of the most common ways that people think of to get other people to do stuff. In the talks that I give on the topic I tell people that rewards are actually one of the least effective ...
The ReWiND method, which consists of three phases: learning a reward function, pre-training, and using the reward function ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results