Categorical search spaces #863

uri-granta · 2024-07-31T12:55:28Z

Related issue(s)/PRs:

Summary

Add support for Categorical search spaces.

Modelling support will come in a future PR (see #862).

Fully backwards compatible: yes

PR checklist

The quality checks are all passing
The bug case / new feature is covered by tests
Any new features are well-documented (in docstrings or notebooks)

khurram-ghani

Some comments on the main changes. I haven't looked at the unit testing yet.

trieste/space.py

khurram-ghani · 2024-08-01T14:21:28Z

trieste/space.py

+            return tf.stack(
+                [tf.gather(tf.constant(self._tags[i]), row[i]) for i in range(len(row))]
+            )


I think you can probably use tf.gather_nd instead of tf.gather and a for loop. Not sure, but you might be able to also get rid of tf.map_fn.

Not sure that's possible, as self._tags is a non-rectangular Sequence[Sequence[str], not a tensor.

Okay 👍 .

There might be a way (see example) to use ragged tensors with batch_dims arg to tf.gather, but that might not be more efficient than a for loop. Happy for you to ignore this :-)

trieste/space.py

hstojic

done a quick pass, looks ok to me, no major comments
I'll let @khurram-ghani approve it

hstojic · 2024-08-06T21:17:05Z

trieste/space.py

+
+    def __repr__(self) -> str:
+        """"""
+        return f"DiscreteSearchSpace({self._points!r})"


self.__class__.__name__ instead of a string?

khurram-ghani

Looks good.

However, my personal preference is to not put the one_hot_encoder in the categorical and product search space classes. This could be a separate class that takes a search space as an __init__ argument.

I realise that these are just default implementations and users can always write separate implementations. As we know, there would be cases where we want to encode in different ways, especially for the product space (perhaps the categorical-only case is fine). I just think it is better to have the implied flexibility and not burn a default implementation in the spaces. I guess it just depends on our opinion on what is the most likely use case.

khurram-ghani · 2024-08-07T12:15:04Z

trieste/space.py

+            encoded = tf.concat(
+                [encoder(column) for encoder, column in zip(encoders, columns)], axis=1
+            )


Given x is now a float type, do we need to do some checking/casting similar to to_tags? I'm not sure if tf.keras.layers.CategoryEncoding allows non-integer inputs.

It's only the underling self.points in GeneralDiscreteSearchSpace that are now floats. The one hot encoding uses self.tags directly so isn't affected. (And this is all tested.)

The x passed in by the user are the underlying indices, which can be float. We then call the encoder (after flattening) with these. From what I can see, the testing uses integer indices only, but I could have easily missed that. But in any case, it doesn't matter too much as I just tried out tf.keras.layers.CategoryEncoding with float types and it silently converts them into integers (in a weird way, see below). But perhaps we want to be explicit with our checking/assertions and not rely on tensorflow.

> e = tf.keras.layers.CategoryEncoding(4, output_mode="one_hot") > e(tf.constant([3.0, 2.999999], dtype=tf.float32)) <tf.Tensor: shape=(2, 4), dtype=float32, numpy= array([[0., 0., 0., 1.], [0., 0., 1., 0.]], dtype=float32)> > e(tf.constant([3.0, 2.9999999], dtype=tf.float32)) <tf.Tensor: shape=(2, 4), dtype=float32, numpy= array([[0., 0., 0., 1.], [0., 0., 0., 1.]], dtype=float32)>

Categorical search spaces

69e7c0b

uri-granta requested a review from khurram-ghani July 31, 2024 12:59

khurram-ghani reviewed Aug 1, 2024

View reviewed changes

hstojic reviewed Aug 6, 2024

View reviewed changes

Review comments

1209ff4

uri-granta mentioned this pull request Aug 7, 2024

Encoded models #864

Merged

3 tasks

khurram-ghani approved these changes Aug 7, 2024

View reviewed changes

Uri Granta added 2 commits August 7, 2024 12:45

Use float indices to support product search spaces

b13c147

Test non-integer indices

63b62b1

khurram-ghani reviewed Aug 7, 2024

View reviewed changes

Docstring example

fc8b5df

uri-granta merged commit 43e6f01 into develop Aug 8, 2024
12 checks passed

uri-granta deleted the uri/categorical_search_spaces branch August 8, 2024 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical search spaces #863

Categorical search spaces #863

uri-granta commented Jul 31, 2024 •

edited

Loading

khurram-ghani left a comment

khurram-ghani Aug 1, 2024

uri-granta Aug 7, 2024

khurram-ghani Aug 7, 2024

hstojic left a comment

hstojic Aug 6, 2024

khurram-ghani left a comment

khurram-ghani Aug 7, 2024

uri-granta Aug 7, 2024

khurram-ghani Aug 7, 2024

Categorical search spaces #863

Categorical search spaces #863

Conversation

uri-granta commented Jul 31, 2024 • edited Loading

Summary

PR checklist

khurram-ghani left a comment

Choose a reason for hiding this comment

khurram-ghani Aug 1, 2024

Choose a reason for hiding this comment

uri-granta Aug 7, 2024

Choose a reason for hiding this comment

khurram-ghani Aug 7, 2024

Choose a reason for hiding this comment

hstojic left a comment

Choose a reason for hiding this comment

hstojic Aug 6, 2024

Choose a reason for hiding this comment

khurram-ghani left a comment

Choose a reason for hiding this comment

khurram-ghani Aug 7, 2024

Choose a reason for hiding this comment

uri-granta Aug 7, 2024

Choose a reason for hiding this comment

khurram-ghani Aug 7, 2024

Choose a reason for hiding this comment

uri-granta commented Jul 31, 2024 •

edited

Loading