add policy_utils #279

salaast · 2023-07-18T22:50:28Z

No description provided.

mtrofin · 2023-07-18T23:28:20Z

compiler_opt/es/policy_utils.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+###############################################################################


I don't think this specific file needs this - these are general - purpose TF utilities.

The extra parts have been removed now

mtrofin · 2023-07-18T23:29:08Z

compiler_opt/es/policy_utils.py

+  return policy
+
+
+def get_vectorized_parameters_from_policy(


doc strings please (for all of them)

Doc strings have been added

mtrofin · 2023-07-18T23:29:51Z

compiler_opt/es/policy_utils.py

+    policy: Union[tf_policy.TFPolicy, tf.Module]) -> npt.NDArray[np.float32]:
+  if isinstance(policy, tf_policy.TFPolicy):
+    variables = policy.variables()
+  elif policy.model_variables:


I'd argue for else: and assert the policy has a model_variables. IIUC it's a bug otherwise (API user error: they either pass in a TFPolicy of a Module)

mtrofin · 2023-07-18T23:31:21Z

compiler_opt/es/policy_utils.py

+  if isinstance(policy, tf_policy.TFPolicy):
+    variables = policy.variables()
+  else:
+    try:


for consistency, whatever you do here should match whatever we do on line 91. Come to think of it, I think the python preference is to raise ValueError (i.e. not assert - that's my C++ speaking)

The checks have been changed to be the same now--check for TFPolicy, check for model_variables, else raise ValueError

mtrofin · 2023-07-18T23:32:21Z

compiler_opt/es/policy_utils.py

+  param_pos = 0
+  for variable in variables:
+    shape = tf.shape(variable).numpy()
+    num_ele = np.prod(shape)


num_elems?

Yeah it is a bit awkward, I changed it to num_elems now

mtrofin · 2023-07-18T23:32:54Z

compiler_opt/es/policy_utils_test.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+###############################################################################


same comment re. this bit of the docstring

compiler_opt/es/policy_utils_test.py

mtrofin · 2023-07-18T23:40:45Z

compiler_opt/es/policy_utils_test.py

+class VectorTest(absltest.TestCase):
+
+  def test_set_vectorized_parameters_for_policy(self):
+    # create a policy


2 high level questions:

can we decouple these tests from registry and all that

can we test the 2 supported scenarios: TFAgent and tf.Module.

I will have to look into other ways of creating a policy in order to allow decoupling. In regards to the tests, I have added sections to test loaded policies now. Debugging has revealed that the loaded policy is not an instance of tf.Module but rather one of AutoTrackable.

ok - could you also add a reference to #280 over each test, easier to avoid forgetting

ebrevdo · 2023-07-19T18:19:58Z

compiler_opt/es/policy_utils.py

+  elif hasattr(policy, 'model_variables'):
+    variables = policy.model_variables
+  else:
+    raise ValueError('policy must be a TFPolicy or a loaded SavedModel')


Include the policy object in the ValueError message so we know what was passed.

I updated the message now

ebrevdo · 2023-07-19T18:20:56Z

compiler_opt/es/policy_utils.py

+  else:
+    raise ValueError('policy must be a TFPolicy or a loaded SavedModel')
+
+  parameters = [var.numpy().flatten() for var in variables]


can you have a unit test to make sure that a TFPolicy and its loaded SavedModel have identical ordering of variables? (it's sufficient to check that the float values in parameters are approximately identical using np.testing.assert_allclose or similar)

I added a new test for this. Please check to make sure I understood correctly. Thanks

… edit type annotations, remove credit message

…fpolicy and loaded policy variable orders match

mtrofin

some interim comments - I know you were going to look at further decoupling the "value" tests from specific problem solvers ("registry"), but they may be applicable.

mtrofin · 2023-07-20T22:28:29Z

compiler_opt/es/policy_utils_test.py

+    saver.save(policy_save_path)
+
+    # set the values of the policy variables
+    length_of_a_perturbation = 17218


why 17218 - it's the sum of the shapes on line 129, correct? could you move that line above, then calculate length_of_a_perturbation from it, and maybe rename length_of_a... to expected_length_of_a_perturbation - then it's (I'd argue) more clear what's going on.

mtrofin · 2023-07-20T22:30:04Z

compiler_opt/es/policy_utils_test.py

+    idx = 0
+    for i, variable in enumerate(policy.variables()):  # pylint: disable=not-callable
+      self.assertEqual(variable.shape, expected_variable_shapes[i])
+      nums = variable.numpy().flatten()


nit: s/nums/variable_values

mtrofin · 2023-07-20T22:30:19Z

compiler_opt/es/policy_utils_test.py

+    for i, variable in enumerate(policy.variables()):  # pylint: disable=not-callable
+      self.assertEqual(variable.shape, expected_variable_shapes[i])
+      nums = variable.numpy().flatten()
+      for num in nums:


nit: s/num/variable_value

mtrofin · 2023-07-20T22:34:29Z

compiler_opt/es/policy_utils_test.py

+    expected_variable_shapes = [(71, 64), (64), (64, 64), (64), (64, 64), (64),
+                                (64, 64), (64), (64, 2), (2)]
+    # iterate through variables and check their shapes and values
+    idx = 0


you could say expected_values = range(expected_length_of_a_perturbation), then you don't need idx, you can just check on line 136 something like:

self.assertListEqual(expected_values[:len(variable_values)], variable_values) expected_values = expected_values[len(variable_values:]

then at the end expected_values should be empty.

mtrofin · 2023-07-20T22:35:21Z

compiler_opt/es/policy_utils_test.py

+    sm = tf.saved_model.load(policy_save_path + '/policy')
+    self.assertNotIsInstance(sm, tf_policy.TFPolicy)
+    policy_utils.set_vectorized_parameters_for_policy(sm, params)
+    idx = 0


same idea with idx... and same comment further below about naming.

mtrofin

lgtm, some comments before landing

mtrofin · 2023-07-25T17:41:09Z

compiler_opt/es/policy_utils_test.py

+class VectorTest(absltest.TestCase):
+
+  def test_set_vectorized_parameters_for_policy(self):
+    # create a policy


ok - could you also add a reference to #280 over each test, easier to avoid forgetting

mtrofin · 2023-07-25T17:42:19Z

compiler_opt/es/policy_utils_test.py

+    # set the values of the policy variables
+    policy_utils.set_vectorized_parameters_for_policy(policy, VectorTest.params)
+    # iterate through variables and check their shapes and values
+    expected_values = [*VectorTest.params]


nit: add a comment that we want to destructively go over the expected values, hence the deep copy.

mtrofin · 2023-07-25T17:43:50Z

compiler_opt/es/policy_utils_test.py

+    # save the policy
+    saver = policy_saver.PolicySaver({'policy': policy})
+    testing_path = self.create_tempdir()
+    policy_save_path = os.path.join(testing_path, 'temp_output/policy')


`os.path.join(testing_path, 'temp_output', 'policy')

i.e. don't assume '/' is the separator.

also, can we call 'policy' something else, it's a bit confusing how then we add again a 'policy' on line 144

Ok, I made a variable POLICY_NAME and used it for the name in the dict on lines like 126 here for clarity. Should I also change lines with quantile_file_dir='compiler_opt/rl/inlining/vocab/' to use join since the separator is hardcoded?

That's fine, we'll remove it later bc #280 anyway.

mtrofin · 2023-07-25T17:44:01Z

compiler_opt/es/policy_utils_test.py

+    self.assertEmpty(expected_values)
+
+    # get saved model to test a loaded policy
+    sm = tf.saved_model.load(policy_save_path + '/policy')


os.path.join instead of +

mtrofin · 2023-07-25T17:44:28Z

compiler_opt/es/policy_utils_test.py

+    # save the policy
+    saver = policy_saver.PolicySaver({'policy': policy})
+    testing_path = self.create_tempdir()
+    policy_save_path = os.path.join(testing_path, 'temp_output/policy')


same comment about path and names

mtrofin · 2023-07-25T17:44:38Z

compiler_opt/es/policy_utils_test.py

+    np.testing.assert_array_almost_equal(output, VectorTest.params)
+
+    # get saved model to test a loaded policy
+    sm = tf.saved_model.load(policy_save_path + '/policy')


same comment as before

mtrofin · 2023-07-25T17:44:51Z

compiler_opt/es/policy_utils_test.py

+    # save the policy
+    saver = policy_saver.PolicySaver({'policy': policy})
+    testing_path = self.create_tempdir()
+    policy_save_path = os.path.join(testing_path, 'temp_output/policy')


mtrofin · 2023-07-25T17:44:58Z

compiler_opt/es/policy_utils_test.py

+    tf_params = policy_utils.get_vectorized_parameters_from_policy(policy)
+
+    # get loaded policy
+    sm = tf.saved_model.load(policy_save_path + '/policy')


salaast requested review from ebrevdo and mtrofin July 18, 2023 22:50

add policy_utils

c0c3e4d

salaast force-pushed the policy_utils branch from caadaec to c0c3e4d Compare July 18, 2023 22:55

mtrofin reviewed Jul 18, 2023

View reviewed changes

ebrevdo reviewed Jul 19, 2023

View reviewed changes

add tests for loaded policies, revise error handling, add docstrings,…

35425e8

… edit type annotations, remove credit message

salaast force-pushed the policy_utils branch from 3bc4fe6 to 35425e8 Compare July 19, 2023 18:22

salaast added 4 commits July 19, 2023 20:07

include passed object in ValueError message, add test to check that t…

abe2201

…fpolicy and loaded policy variable orders match

replace AutoTrackable type annotation with HasModelVariables Protocol

96dcfe2

use Any type as placeholder for Protocol

f9d098e

implement HasModelVariables Protocol

9763737

mtrofin reviewed Jul 21, 2023

View reviewed changes

Restructure value tests and rename variables for clarity

485b8b1

mtrofin approved these changes Jul 25, 2023

View reviewed changes

Use os.path.join to form paths

74b7605

mtrofin merged commit 79d7049 into google:main Jul 25, 2023
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add policy_utils #279

add policy_utils #279

salaast commented Jul 18, 2023

mtrofin Jul 18, 2023

salaast Jul 19, 2023

mtrofin Jul 18, 2023

salaast Jul 19, 2023

mtrofin Jul 18, 2023

mtrofin Jul 18, 2023

salaast Jul 19, 2023

mtrofin Jul 18, 2023

salaast Jul 19, 2023

mtrofin Jul 18, 2023

mtrofin Jul 18, 2023

salaast Jul 19, 2023

mtrofin Jul 25, 2023

ebrevdo Jul 19, 2023

salaast Jul 19, 2023

ebrevdo Jul 19, 2023

salaast Jul 19, 2023

mtrofin left a comment

mtrofin Jul 20, 2023

mtrofin Jul 20, 2023

mtrofin Jul 20, 2023

mtrofin Jul 20, 2023

mtrofin Jul 20, 2023

mtrofin left a comment

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

salaast Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

mtrofin Jul 25, 2023

add policy_utils #279

add policy_utils #279

Conversation

salaast commented Jul 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrofin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrofin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment