v0.5
Major updates since initial release
- Fix double critic initialization; always using target critic
- Fix MC return calculation in RWR (following RLPD)
- Switch to using
terminated
andtruncated
instead ofdone
- Add SAC, RLPD, Cal-QL, and IBRL implementation, tested with halfcheetah results
Minors
- Log training steps
- Rename
transition_dim
toaction_dim
- Fix robomimic lowdim rendering issue
In progress (v1.0)
- Updating baseline results
- Modifications to DPPO updates with potential performance improvement