You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thanks for a great project. It's very useful. I have a question on the model code related to the Dueling algorithm. For example: Pong-v0_DQN_CNN_TF2.py
Here is an example of the code: action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)
let's say our batch looks like this: a = tf.constant([[1.0, 2.0], [-2.0, 3.0], [3.0, -4.0]]) print('a=', a)
a= tf.Tensor(
[[ 1. 2.]
[-2. 3.]
[ 3. -4.]], shape=(3, 2), dtype=float32)
The result of the "K.mean" function will be a tensor with shape (1, 1): print('Kmean=', K.mean(a[:, :], keepdims=True))
Kmean= tf.Tensor([[0.5]], shape=(1, 1), dtype=float32)
Shouldn't there be a tensor with shape (3, 1)? print('Kmean=', K.mean(a[:, :], axis=1, keepdims=True))
Kmean= tf.Tensor(
[[ 1.5]
[ 0.5]
[-0.5]], shape=(3, 1), dtype=float32)
If we assume that our batch contains 3 elements, then the mean value should be calculated for each element in the batch separately. Or am I missing something ?
The text was updated successfully, but these errors were encountered:
Hello,
Thanks for a great project. It's very useful. I have a question on the model code related to the Dueling algorithm. For example:
Pong-v0_DQN_CNN_TF2.py
Here is an example of the code:
action_advantage = Lambda(lambda a: a[:, :] - K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)
let's say our batch looks like this:
a = tf.constant([[1.0, 2.0], [-2.0, 3.0], [3.0, -4.0]])
print('a=', a)
a= tf.Tensor(
[[ 1. 2.]
[-2. 3.]
[ 3. -4.]], shape=(3, 2), dtype=float32)
The result of the "K.mean" function will be a tensor with shape (1, 1):
print('Kmean=', K.mean(a[:, :], keepdims=True))
Kmean= tf.Tensor([[0.5]], shape=(1, 1), dtype=float32)
Shouldn't there be a tensor with shape (3, 1)?
print('Kmean=', K.mean(a[:, :], axis=1, keepdims=True))
Kmean= tf.Tensor(
[[ 1.5]
[ 0.5]
[-0.5]], shape=(3, 1), dtype=float32)
If we assume that our batch contains 3 elements, then the mean value should be calculated for each element in the batch separately. Or am I missing something ?
The text was updated successfully, but these errors were encountered: