-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor model class and runners to be more independent #494
Conversation
ebd1ca8
to
19e18db
Compare
19e18db
to
e851890
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!! Just commented few doubts.
I like the idea of creating Model from ModelClass instead of ModelRunner and separating it from the server.
) | ||
model_class = classes[0] | ||
|
||
model_args = class_config.get("args", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, what these args could be for initialising model class?
because we are not using any __init__
method in ModelClass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow init args for specifying checkpoint files, paths to config files or other params. This would allow us to more easily init models for testing to point them at different paths. The other option would be to hardcode paths in a top-level main python file, and use those to init another model that has init args, but I like being able to specify them along with the rest of the config. That way we can write those configs with the rest of the model config, and not require to write a templated python file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's a model specific thing why not just put it in model.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or these args are the inspected arts of the python function and not something we put into config.yaml right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be able to pass in some config args to the model, like checkpoint_path
or potentially preprocessing params or a path to another config file. That way when you're developing, you can init the model directly with different settings, and deploy it with a particular set of args for the model upload.
The alternative, would be to essentially make model.py a config file written in python with some extra boilerplate, rather than have the model definition. That would turn model.py into something like the following, which I think is worse than putting args in the config.yaml.
import yaml
from my_model import MyModel
model_config_file = 'model_config.yaml'
model_config = yaml.safe_load(open(model_config_file))
class MyModelClass(MyModel):
def __init__(self):
super().__init__(**model_config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it seems we're a little unsure of how to handle these sorts of things now. I think allowing args or a path to another config file for init would be best. Otherwise we'll end up with boilerplate python files that are config files. But we can try to do that in a separate PR, and I can limit this one to just the ModelClass inheritance.
class_config = self.config.get("class_info", {}) | ||
|
||
model_file = class_config.get("file_path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters would provide much greater flexibility in defining the model. However, I'm uncertain whether we should include these additional parameters in the config.yaml file or establish a fixed structure for defining the model
Could this potentially confuse the users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main reason not to allow this, would be if we don't want to support and maintain these options. However if we allow init args in the config, it feels odd and incomplete without also being able to specify module and class. I found it convenient to be able to specify args in particular, but also the model file for tests. Note the defaults are what we have now, where it looks for a ModelClass in model.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm less stuff to customize in config.yaml is better I think. Right now we expect 1/model.py as the model file, it hink that can atleast be a default here and then let people override it.
We also need to badly get this config spec into a proto so it's defined what the fields are and we can parse it from the proto. Otherwise we'll have what we had before with config maps which got out of hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed them for now. We can figure out a way to better include init args in a separate PR.
clarifai/runners/server.py
Outdated
parser.add_argument( | ||
'--download_checkpoints', | ||
action='store_true', | ||
help='Downloads any remote checkpoints specified in the config', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure when this would be use. I think when local testing download_checkpoints
should always be called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in because the old switch --start_dev_server
called download_checkpoints. I replaced that with --grpc
which just enables grpc server instead of the runner, but doesn't download checkpoints on its own. So now --start_dev_server
would be equivalent to --grpc --download_checkpoints
. I don't know if that's useful or needed, but kept it for that reason. However, I deleted this download_checkpoints option now, since now there is a new cli command for that. If you think --grpc --download_checkpoints
in this one command would still be useful to correspond to the old --start_dev_server
we can put it back in.
3a75123
to
ed00d68
Compare
@@ -43,7 +41,7 @@ def __init__( | |||
num_parallel_polls, | |||
**kwargs, | |||
) | |||
self.load_model() | |||
self.model = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does model.load_model() get called now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That happens here, when the model is constructed. I'm actually not sure if we need to separate __init__
from load_model
now, we could also have it load in its init. To me it looks like reason we didn't do that in the old class structure, was because the init function was for the Runner, not the Model, and so had all the params for runner ids etc. as its init args, which we didn't want the user to have to deal with in their implementation. Now that the model is not inheriting from runner, we can use init instead and remove load_model. The only advantage to keeping it would be if we wanted to allow for construction before checkpoint loading, e.g. for examining config without loading the checkpoint; there isn't a place we do that yet, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think we should keep load_model to make it more clear what you have to implement as a user.
def load_model(self): | ||
raise NotImplementedError("load_model() not implemented") | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want to guide people into always have to implement this, even if their implementation is "pass"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. I don't know if we need this now. However, I'll change back to NotImplementedError for now, and the examples can keep the checkpoint loads there to guide users to that, as you mention.
f6baf39
to
c0716ba
Compare
Minimum allowed line rate is |
Currently, the ModelRunner is the primary object of instantiation for a model in a server. This has several drawbacks:
Instead, we should define the model implementation using ModelClass directly, and create Runner and gRPC servers that can each call the Model object. This separates the model class from the server method, while still maintaining the same interfaces.
NOTE: This change also requires us to change the base class of all models in the examples repo, to use ModelClass instead of ModelRunner as their base class.