New model interface #516

luv-bansal · 2025-02-12T11:53:42Z

Why

How

Tests

Notes

phatvo9 · 2025-02-12T18:04:10Z

clarifai/runners/utils/data_handler.py


-class BaseDataHandler:
+  def to_proto(self, output):


why it has to have 'output' here? I think it should turn whatever it has to output proto?

…into new-model-interface

deigen · 2025-02-12T18:49:00Z

Nice --- really on the right track. Things from our discussion for next changes:

Named outputs, e.g. def predict(x: Image) -> Output(y: str, z: str)
This is always using parts, which is different from using data.image directly (now, we'll use data.parts[argname].data.image). That needs to be discussed more broadly as it can make things around monitoring, UI, etc different.
- A possible heuristic that might address this most of the way, is serialize the first parameter in the top-level data, then all subsequent params in the parts.
Need to add name/id field to the Parts proto in the protos repo.
Provide some example models (either in examples or here) for testing
We might be better off not supporting unlimited dict nesting levels for now. However, a json dict might be good to support for params and other definitions. That may not need to be in this PR though.
We should support the following python types as "atomics" --- some of these are not yet there, e.g. bytes or int:
- str, bytes, int, float, bool, np.ndarray, PIL.Image.Image
- for types without corresponding fields in the proto now, we need to either add fields or make correspondences to existing ones (e.g. int might use ndarray, though I don't particularly like that, it would be simpler to use int64 in protobuf. or possibly json.)
Make sure to test with invalid client calls with the wrong types --- What is the error provided to the user in these cases? It should be along the lines of what they would get calling a function with the wrong args.
Test with the case where the server defines def predict(x: str) -> str: return x and the client calls with model.predict(x=Image.open('test.jpg'). What happens? Right now I think it will return the empty string (there is nothing in parts[x].text.raw) but it would be better to error on mismatched types. However, if the user actually calls with an empty string, it should still return the empty string for model.predict(x='')

clarifai/client/model.py

zeiler · 2025-02-14T17:37:47Z

clarifai/runners/models/model_class.py

+    outputs = []
+    inputs = self._convert_proto_to_python(request.inputs)
+    if len(inputs) == 1:
+      inputs = inputs[0]


oh this will be nice as in this wrapper we can handle things that the user should not have to deal with like including in each output the input.id too which some of our APIs / code

deigen

Looks like this is making progress. I'm still reading over this revision, but have some comments so far on generate and stream:

There are a few issues with batched_predict() and batched_generate(), which I mention in my other comments. Because of these, I think it would be best to remove these functions from this PR and implement them in a subsequent one.
stream() should be called once with an iterator, not once per input in the stream (see inline comment on this)

deigen · 2025-02-18T17:59:15Z

clarifai/runners/models/model_class.py

+    """Batch generate method for multiple inputs."""
+    with ThreadPoolExecutor() as executor:
+      futures = [executor.submit(self.generate, **input) for input in inputs]
+      return [future.result() for future in futures]


This will return a list of generators, not one generator that zips all the outputs. Most simply, this should use zip_longest. Also, this won't run the generators in different threads -- only the generate call that produces the generator will run in another thread. All the calls to next() will be in the zip call in the main thread. That's inconsistent with the multithreaded batch_predict() behavior, which does run them in different threads.

For now I'm not implementing batch_generate function as we discussed in call, we will deal with it in separate PR

deigen · 2025-02-18T18:01:27Z

clarifai/runners/models/model_class.py

+    """Batch predict method for multiple inputs."""
+    with ThreadPoolExecutor() as executor:
+      futures = [executor.submit(self.predict, **input) for input in inputs]
+      return [future.result() for future in futures]


I'm not sure whether to use multiple threads by default; multithreading for a batched call should be optional/configurable, but I don't know what the default number of threads should be. My inclination is to use the safest route, which is no multithreading unless enabled. There can be examples in the examples repo that have it enabled, so if you start by copying an example it will be enabled to start with for those.

For now, I've update this to simple for loop implementation as per your suggestion, we can deal optimising batching in separate PR

deigen · 2025-02-18T18:44:48Z

clarifai/runners/models/model_class.py

+      inputs = self._convert_proto_to_python(request.inputs)
+      if len(inputs) == 1:
+        inputs = inputs[0]
+        for output in self.stream(**inputs):


A call to stream() should call the function once, with an iterator of inputs that we get from the request stream, so the stream() function impl can take inputs by reading off the stream iterator. This will call stream() once for each input in the stream instead. That will make it more difficult for the user to maintain state (and currently they can't distinguish which stream source is which).

The python input types make this a little difficult, as we'd like to be able to pass in a stream of inputs while still using the converters in this PR. One decent option would be to use an InputStream type or Stream[Input(...)] type, mirroring the Output type. All parts names that we put into the stream are streamed in. Parts names in function kwargs typed direclty (not as a stream) just get the first value and subsequent values passed in that are nonempty but different result in a warning log.

For example:

def stream(self, stream: InputStream(img=Image, text=str), drop: bool = False, param1: str = "value1", param2: int = 2): input_q = queue.Queue(2) output_q = queue.Queue(2) def _read(): try: for input in stream: try: input_q.put(input, block=not drop) # user passes in drop for whether to drop or block input reading when not keeping up, as a function param (in this example) except queue.Full: pass # drop when not keeping up finally: input_q.put(None) def _work(): try: for input in iter(input_q.get, None): # process the stream input --- param1, param2 are values from the first request, input is values for the current input stream request output = _process(input.img, input.text, param1, param2) output_q.put(output) # blocking finally: output_q.put(None) threading.Thread(target=_read).start() threading.Thread(target=_work).start() yield from iter(output_q.get, None)

At some point (not this PR) we should also add util functions for this streaming pattern.

Might be good to add this discussion back to the design doc RFC as well.

Oh yeah Stream_wrapper function is currently wrong implemented. I'm not sure how to correctly implement it

deigen · 2025-02-19T04:13:43Z

clarifai/runners/utils/data_handler.py

+      part.data.image.CopyFrom(image.to_proto())
+    elif isinstance(part_value, list):
+      if len(part_value) == 0:
+        raise ValueError("List must have at least one element")


Should be OK with empty list.

…into new-model-interface

clarifai/runners/models/model_class.py

zeiler · 2025-02-20T03:56:44Z

clarifai/runners/models/model_class.py

@@ -116,7 +117,7 @@ def _convert_proto_to_python(self, inputs: List[resources_pb2.Input]) -> List[Di

  def _convert_part_data(self, data: resources_pb2.Data, param_type: type) -> Any:
    if param_type == str:
-      return data.text.value
+      return data.text.raw


i think we should use the new string_value field

Should we use data.string_value or data.text.raw? It should be consistent everywhere. For now, I have used data.text.raw in all places.

clarifai/runners/models/model_class.py

zeiler · 2025-02-20T03:57:49Z

clarifai/runners/models/model_class.py

+          list_output.append(Audio(part.data.audio))
+        elif part.data.HasField("video"):
+          list_output.append(Video(part.data.video))
+        elif part.data.HasField("bytes_value"):


i don't think HasField works on the built-in types, only message fields

so you have to check for zero values I think is the right approach which is unfortunately not great as you don't know if it's 0 or not provided.

this may end up biting us and we might need to wrap each of the new fields in a message so we can do like Bytes.value as David was saying. but let's see as the zero convention in protobufs is well known and the MessageToDict type of methods understand what to do properly there. We may have to validate that an arg can only have zero values as it's default. Like users shouldn't be allowed to set defaults in python (though that kind of sucks too).

Yeah, you're correct—HasField won't work on built-in types. And when checking for zero values, we can't determine whether a field was not provided or if the user explicitly set it to 0. This is a main concern

for now I'm checking for zero, but I think we can wrap each of the new fields in a message for proper validation

clarifai/runners/utils/data_handler.py

clarifai/client/model.py

…into new-model-interface

luv-bansal · 2025-02-21T12:47:57Z

@deigen, I've addressed all the points you mentioned in your comment and everything we discussed on the call.

you specially asked me focus on these two points:

Make sure to test with invalid client calls with the wrong types --- What is the error provided to the user in these cases? It should be along the lines of what they would get calling a function with the wrong args.

Test with the case where the server defines def predict(x: str) -> str: return x and the client calls with model.predict(x=Image.open('test.jpg'). What happens? Right now I think it will return the empty string (there is nothing in parts[x].text.raw) but it would be better to error on mismatched types. However, if the user actually calls with an empty string, it should still return the empty string for model.predict(x='')

For both cases, the model now throws errors with clear messages:

Case 1 (Invalid parameter in predict method):

Exception: Model Predict failed with response code: FAILURE
details: "Unknown parameter: `text3` in predict method, available parameters: odict_keys([\'text1\', \'text2\'])"
req_id: "sdk-python-11.1.5-dec9f9995bac4d53ba30cbadac651ae2"

Case 2 (Incorrect data type)

Exception: Model Predict failed with response code: FAILURE
details: "expected str datatype but the provided input is not a str"
req_id: "sdk-python-11.1.5-f79bd7b9739a4879b9f2abcd774012f8"

luv-bansal added 4 commits February 11, 2025 12:44

Model side interface

a1c4f2d

client side predict

394dd9e

Model side interface

35ec85c

client side implementation

9bd4d61

phatvo9 reviewed Feb 12, 2025

View reviewed changes

Merge branch 'master' of https://github.com/Clarifai/clarifai-python …

754fa17

…into new-model-interface

luv-bansal and others added 8 commits February 13, 2025 13:26

output with erbitary names

e7fe25e

emove previous wrapper functions and tests

fef57aa

emove previous wrapper functions and tests

c418f10

resolve conflict

001cbf3

merge from master

efce5f2

__init__ update

586c40b

resolvign cannot import name 'BaseClient'

c0401ae

PIL image and list image support

3688add

zeiler reviewed Feb 14, 2025

View reviewed changes

clarifai/client/model.py Outdated Show resolved Hide resolved

zeiler reviewed Feb 14, 2025

View reviewed changes

refraction and building inferface

2ccd4da

deigen reviewed Feb 18, 2025

View reviewed changes

refract code by gpt

6dd1621

deigen reviewed Feb 19, 2025

View reviewed changes

luv-bansal added 2 commits February 19, 2025 05:51

Merge branch 'master' of https://github.com/Clarifai/clarifai-python …

e10be35

…into new-model-interface

testing fix

d0db33c

zeiler reviewed Feb 20, 2025

View reviewed changes

luv-bansal added 6 commits February 20, 2025 05:14

addressing comments

b3ae418

error handleing

7fa72b7

error handling

c845c0f

fix generic data types in data proto

aa205d4

better error handling

1dbca89

remove gpt refracted files

1203721

fixing tests

920123c

deigen mentioned this pull request Feb 20, 2025

draft code with function signature and serialization, as reference as needed #522

Draft

luv-bansal added 3 commits February 21, 2025 02:40

testing arg

8a92bbd

Merge branch 'master' of https://github.com/Clarifai/clarifai-python …

6b40e4c

…into new-model-interface

working MVP

b2c53d5

luv-bansal marked this pull request as ready for review February 21, 2025 11:08

luv-bansal requested a review from deigen February 21, 2025 11:08

better representation of objects

01f44d9

luv-bansal added 2 commits February 21, 2025 08:41

generate method implementation on client side

1a10ab5

from_url in Audio

ca285ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New model interface #516

New model interface #516

luv-bansal commented Feb 12, 2025

phatvo9 Feb 12, 2025

deigen commented Feb 12, 2025 •

edited

Loading

zeiler Feb 14, 2025

deigen left a comment

deigen Feb 18, 2025

luv-bansal Feb 20, 2025 •

edited

Loading

deigen Feb 18, 2025

luv-bansal Feb 20, 2025

deigen Feb 18, 2025

luv-bansal Feb 20, 2025

deigen Feb 19, 2025

zeiler Feb 20, 2025

luv-bansal Feb 20, 2025

zeiler Feb 20, 2025

zeiler Feb 20, 2025

zeiler Feb 20, 2025

luv-bansal Feb 20, 2025

luv-bansal Feb 20, 2025

luv-bansal commented Feb 21, 2025

New model interface #516

Are you sure you want to change the base?

New model interface #516

Conversation

luv-bansal commented Feb 12, 2025

Why

How

Tests

Notes

Choose a reason for hiding this comment

deigen commented Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

deigen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luv-bansal Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luv-bansal commented Feb 21, 2025

deigen commented Feb 12, 2025 •

edited

Loading

luv-bansal Feb 20, 2025 •

edited

Loading