Improve inference performance with loaded TransformerChain ML.NET model #371

najeeb-kazmi · 2019-11-20T23:40:24Z

PR #230 introduced ability to load and score ML.NET models trained in the new ML.NET TransformerChain serialization format. This is done by checking whether "TransformerChain" exists in the archive members. Currently, this is done every time test, predict, predict_proba, and decision_function methods call _predict. ThisPR improves the performance by checking for "TransformerChain" only once when the model is loaded.

…eline

najeeb-kazmi · 2019-11-20T23:41:14Z

@ganik should I add a small ML.NET model file to the project and write a test for inferencing with a loaded ML.NET model?

ganik · 2019-11-22T21:32:22Z

yes, that would be great

In reply to: 556551197 [](ancestors = 556551197)

ganik · 2019-11-22T21:43:01Z

src/python/nimbusml/pipeline.py

        all_nodes = []
-        if is_transformer_chain:
+        if (hasattr(self, '_is_transformer_chain') and
+            self._is_transformer_chain):
            inputs = dict([('data', ''), ('transform_model', self.model)])


How much is perf gain? If its not much I would like to leave this as it is. Reason is that we actually should move to new ML.NET format so then this will be broken with your new fix

6% gain, with .predict called on 100 row UCI Adult test data, in a 100 rep for loop, repeated 5 times.

I think we should take the change. (1) If we leave it as it is, it will still be broken when NimbusML moves to new ML.NET format, and how the graph is constructed will need to change anyway. (2) Moving to new ML.NET format will likely take a long time, as it requires non-trivial changes to how model loading is handled in the entrypoints infrastructure on ML.NET side i.e. new model implementation in addition to PredictorModel and TransformModel. (3) The changes directly below this address an issue where PredictedLabel column is only converted to int32 from bool if it exists (i.e. it is not regression or ranking) and if the dtype is bool (i.e. if it is Binary classification).

najeeb-kazmi · 2019-11-23T01:25:50Z

src/python/nimbusml/tests/pipeline/test_load_save.py

@@ -148,6 +150,12 @@ def test_model_datastream(self):

        os.remove(model_filename)

+    def test_mlnet_model_can_be_scored(self):
+        data = FileDataStream.read_csv(test_file, sep=',', numeric_dtype=np.float32)


ML.NET model is trained on a file with label, and it expects label to be present in the schema of the data being passed to predict

Improve performance pf inferencing with ML.NET models loaded into Pip…

7adb78e

…eline

najeeb-kazmi and others added 2 commits November 21, 2019 17:05

Handle predictions without PredictedLabel column

d142639

Merge branch 'master' into model-loading

d797de1

ganik reviewed Nov 22, 2019

View reviewed changes

Add small model and test

aa10245

najeeb-kazmi commented Nov 23, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve inference performance with loaded TransformerChain ML.NET model #371

Improve inference performance with loaded TransformerChain ML.NET model #371

najeeb-kazmi commented Nov 20, 2019

najeeb-kazmi commented Nov 20, 2019

ganik commented Nov 22, 2019

ganik Nov 22, 2019

najeeb-kazmi Nov 23, 2019

najeeb-kazmi Nov 23, 2019

Improve inference performance with loaded TransformerChain ML.NET model #371

Are you sure you want to change the base?

Improve inference performance with loaded TransformerChain ML.NET model #371

Conversation

najeeb-kazmi commented Nov 20, 2019

najeeb-kazmi commented Nov 20, 2019

ganik commented Nov 22, 2019

ganik Nov 22, 2019

Choose a reason for hiding this comment

najeeb-kazmi Nov 23, 2019

Choose a reason for hiding this comment

najeeb-kazmi Nov 23, 2019

Choose a reason for hiding this comment