To train a model, simply follow the example in train.py. The reward types are "default", "wrapper", and "sparse".
The model weight files are too big to include in this repository. For the full repo with the models, please download at https://drive.google.com/file/d/14qNBTIYYh7gpnagyCgUZyVXO77mfp-jw/view?usp=sharing