I am an independent researcher interested in Deep Reinforcement Learning.
My research focuses on increasing the optimization stability of off-policy gradient-based
I've written two works along this research direction:
-
Stabilizing Q-Learning for Continuous Control
David Yu-Tung Hui
MSc Thesis, University of Montreal, 2022
I presented empirical evidence that LayerNorm prevented off-policy$Q$ -Learning from diverging in the MuJoCo and DeepMind Control continuous control environments. I also showed that using LayerNorm in DDPG enabled learning non-trivial behaviors in the dog-run task of DeepMind Control.
[.pdf] [Errata] -
Double Gumbel Q-Learning
David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon
Spotlight at NeurIPS 2023
In this conference paper, we model noise introduced by a function approximator in$Q$ -learning as a heteroscedastic Gumbel distribution. We derived a loss function from this noise model that was effective in off-policy continuous control -- our resultant algorithm achieved ~2x the aggregate performance of SAC after 1M training timesteps.
[.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Code (GitHub)]
In 2023, I graduated with an MSc from Mila, University of Montreal. I'm looking for opportunities where I can continue my research.
For more information about me, see my Google Scholar.