We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey there, I am training the q5 problem, but the training is not going well. I wondered if you have any clue what is wrong.
Here is a copy of training data: 122501/5000000 [..............................] - ETA: 34518s - Loss: 0.1562 - Avg_R: -21.0000 - Max_R: -21.0000 - eps: -55.3107 - Grads: 0.0000 - Max_Q: 0.0000 - lr: -0.0048
The gradient is 0, Max_Q is 0. Reward has not improved at all.
The text was updated successfully, but these errors were encountered:
The test (nature.py) seems to work -- The issue seems to be a hyperparameter issue i'm guessing..
Average reward: 0.00 +/- 0.00 301/1000 [========>.....................] - ETA: 4s - Loss: 0.1427 - Avg_R: 0.3400 - Max_R: 2.0000 - eps: -0.6500 - Grads: 0.0453 - Max_Q: 0.0628 - lr: -0.0000 Evaluating... Average reward: -1.00 +/- 0.00 401/1000 [===========>..................] - ETA: 4s - Loss: 0.1008 - Avg_R: -0.0150 - Max_R: 0.5000 - eps: -1.8875 - Grads: 0.0585 - Max_Q: 0.0698 - lr: -0.0002 Evaluating... Average reward: 0.50 +/- 0.00 501/1000 [==============>...............] - ETA: 4s - Loss: 0.1244 - Avg_R: 0.4850 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0362 - Max_Q: 0.0775 - lr: 0.0001 Evaluating... Average reward: 0.50 +/- 0.00 601/1000 [=================>............] - ETA: 3s - Loss: 0.1770 - Avg_R: 0.4900 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0461 - Max_Q: 0.0948 - lr: 0.0001 Evaluating... Average reward: 0.50 +/- 0.00 701/1000 [====================>.........] - ETA: 2s - Loss: 0.0341 - Avg_R: 0.4150 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0200 - Max_Q: 0.1068 - lr: 0.0001 Evaluating... Average reward: 0.50 +/- 0.00 801/1000 [=======================>......] - ETA: 1s - Loss: 0.0636 - Avg_R: 0.5000 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0292 - Max_Q: 0.1167 - lr: 0.0001 Evaluating... Average reward: 0.50 +/- 0.00 901/1000 [==========================>...] - ETA: 0s - Loss: 0.0008 - Avg_R: 0.4600 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0606 - Max_Q: 0.1244 - lr: 0.0001 Evaluating... Average reward: 0.50 +/- 0.00 1001/1000 [==============================] - 10s - Loss: 0.0010 - Avg_R: 0.4500 - Max_R: 0.5000 - eps: 0.0100 - Grads: 0.0434 - Max_Q: 0.1302 - lr: 0.0001
Sorry, something went wrong.
No branches or pull requests
Hey there,
I am training the q5 problem, but the training is not going well. I wondered if you have any clue what is wrong.
Here is a copy of training data:
122501/5000000 [..............................] - ETA: 34518s - Loss: 0.1562 - Avg_R: -21.0000 - Max_R: -21.0000 - eps: -55.3107 - Grads: 0.0000 - Max_Q: 0.0000 - lr: -0.0048
The gradient is 0, Max_Q is 0. Reward has not improved at all.
The text was updated successfully, but these errors were encountered: