Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when saving model using model.save #1246

Open
le-tan-phuc opened this issue May 14, 2024 · 13 comments
Open

Error when saving model using model.save #1246

le-tan-phuc opened this issue May 14, 2024 · 13 comments

Comments

@le-tan-phuc
Copy link

le-tan-phuc commented May 14, 2024

Description

Hi, I'm new to Tensorflow.net. I'm just playing around with the "Toy version of ResNet in Keras" example on the main page and got an error at the model.save("./toy_resnet_model");

C# threw an exception: System.InvalidOperationException: 'Collection was modified; enumeration operation may not execute.'

I tried to debug and trace the problem and it seems like the exception was thrown at some point within this function:

(MetaGraphDef, Graph, TrackableSaver, AssetInfo, IList<Trackable>, IDictionary<Trackable, IEnumerable<TrackableReference>>) tuple = _build_meta_graph(obj, signatures, options, metaGraphDef);

which is part of

(saved_nodes, node_paths) = SavedModelUtils.save_and_return_nodes(model, filepath, signatures, options);

which is part of

KerasSavedModelUtils.save_model(this, filepath, overwrite, include_optimizer, signatures, options, save_traces);

Package installed:

  • TensorFlow.NET 0.150.0
  • TensorFlow.Keras 0.15.0
  • SciSharp.TensorFlow.Redist 2.16.0

Any help would be appreciated! Thank you.

@AdrienDeverin
Copy link

AdrienDeverin commented May 14, 2024

I encountered the same problem months ago. #1017
My conclusion was that some layers (notably the Cropping layer in my case) weren't managed properly. Could you provide the model that caused the problem?

@le-tan-phuc
Copy link
Author

le-tan-phuc commented May 14, 2024

Hi @AdrienDeverin, I just put the exact example from the TensorFlow.NET GitHub page to try out and face this problem. I attached the code again here for easy reference, and the error occurs at the last line, which is the model.save. I have also tried changing the save_format from "tf" to "h5" and it runs without error, but nothing was saved:

using static Tensorflow.Binding;
using static Tensorflow.KerasApi;
using Tensorflow;
using Tensorflow.NumPy;

var layers = keras.layers;
// input layer
var inputs = keras.Input(shape: (32, 32, 3), name: "img");
// convolutional layer
var x = layers.Conv2D(32, 3, activation: "relu").Apply(inputs);
x = layers.Conv2D(64, 3, activation: "relu").Apply(x);
var block_1_output = layers.MaxPooling2D(3).Apply(x);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_1_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_2_output = layers.Add().Apply(new Tensors(x, block_1_output));
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(block_2_output);
x = layers.Conv2D(64, 3, activation: "relu", padding: "same").Apply(x);
var block_3_output = layers.Add().Apply(new Tensors(x, block_2_output));
x = layers.Conv2D(64, 3, activation: "relu").Apply(block_3_output);
x = layers.GlobalAveragePooling2D().Apply(x);
x = layers.Dense(256, activation: "relu").Apply(x);
x = layers.Dropout(0.5f).Apply(x);
// output layer
var outputs = layers.Dense(10).Apply(x);
// build keras model
var model = keras.Model(inputs, outputs, name: "toy_resnet");
model.summary();
// compile keras model in tensorflow static graph
model.compile(optimizer: keras.optimizers.RMSprop(1e-3f),
    loss: keras.losses.SparseCategoricalCrossentropy(from_logits: true),
    metrics: new[] { "acc" });
// prepare dataset
var ((x_train, y_train), (x_test, y_test)) = keras.datasets.cifar10.load_data();
// normalize the input
x_train = x_train / 255.0f;
// training
model.fit(x_train[new Slice(0, 2000)], y_train[new Slice(0, 2000)],
            batch_size: 64,
            epochs: 10,
            validation_split: 0.2f);
// save the model
model.save("./toy_resnet_model");

@AdrienDeverin
Copy link

AdrienDeverin commented May 14, 2024 via email

@le-tan-phuc
Copy link
Author

Thanks for your prompt response. I've tried to put a full path folder, but the problem remains.
This is what is shown in the output log:

Exception thrown: 'System.InvalidOperationException' in mscorlib.dll
An unhandled exception of type 'System.InvalidOperationException' occurred in mscorlib.dll
Collection was modified; enumeration operation may not execute.

@AdrienDeverin
Copy link

AdrienDeverin commented May 14, 2024

It's really strange. I tried it myself an no problem appear. Everything go well.

Try with this config (normally it doesn't matter, your seems good) :

  • Dependancy : SciSharp.TensorFlow.Redist (2.16.0)
  • directly download the Tensorflow.Net-master repo and add project Tensorflow.Keras and Tensorflow.Biding (Tensorflow.Core.sln) to your solution. Add them as reference of your test program.

My import :

using Tensorflow;               
using Tensorflow.NumPy;   
using Tensorflow.Keras;
using Tensorflow.Keras.Layers;
using Tensorflow.Keras.Saving;
using Tensorflow.Keras.Engine; 
using Tensorflow.Keras.Losses;
using Tensorflow.Keras.Utils;
using Tensorflow.Keras.ArgsDefinition;
using Tensorflow.Keras.ArgsDefinition.Reshaping;
using Tensorflow.Operations.Activation;
using Tensorflow.Operations.Initializers;
using Tensorflow.Common.Types;
using static Tensorflow.KerasApi;
using static Tensorflow.Binding; 
using static Tensorflow.ops;
using static Tensorflow.ApiDef.Types;

@AsakusaRinne
Copy link
Collaborator

Hi, I have tried your code but I failed to reproduce it. I ran it and everything seemed to go well.

The only difference between our code is that I changed epoch to 1 and batch to 4 to make it faster to complete the training. I guess that doesn't matter.

P.S. I was using the CPU redist package.

@le-tan-phuc
Copy link
Author

le-tan-phuc commented May 15, 2024

Hi @AdrienDeverin and @AsakusaRinne, thank you both for your help. Let me explain what exactly happened: I have 2 computers, A and B. I started trying the example code on computer A, where I created a new project with .NET Framework 4.8. Then I faced the problem described at the beginning. I tried different stuff as suggested by @AdrienDeverin but it didn't solve the problem. Later on, I created a new project based on .NET 8.0 on computer A, and it worked like a charm. I tried to replicate the solution on computer B, also with .NET 8 and all the same Nuget packages installed. It now gives me a different error on the model.save:

System.NotImplementedException: ''

Have you experienced this before? This is pretty confusing for me.

Ps: this is the project properties for your reference. I'm using VS2022 V17.9.6
Capture

@AsakusaRinne
Copy link
Collaborator

I didn't manage to reproduce it on my PC. Could you please clone the repo and add project reference to it, so that a detailed trace back will be shown?

@AdrienDeverin
Copy link

Me too, I tried to reproduce your bug, but it's working correctly on my computer... :/
(To add more, I was testing it in .NET 6.0)

Another idea to understand where the problem lies: since you have done what I said earlier, you could use the debug mode and see in the code step by step where you go (and compare with computer A)...

@le-tan-phuc
Copy link
Author

Hi all, thank you for your support. I've tried again with a fresh project on computer B based on .NET 8.0 in a local folder and it works perfectly now. I guess the previous problem on System.NotImplementedException: maybe because the project folder was placed in Onedrive of computer A, got synced to computer B and somehow it ends up with that error when run on computer B because some files are missing.
I also tried with .NET 6.0 as @AdrienDeverin , and it worked as well now.
Nevertheless, a new project based on .NET Framework 4.5 still has the original problem, maybe it's not supported.

@AsakusaRinne
Copy link
Collaborator

Nevertheless, a new project based on .NET Framework 4.5 still has the original problem, maybe it's not supported.

It's expected to support .NET framework 4.5. Could you please run with the tf.net repo and put the detailed traceback here if you'd like to dig on it?

@le-tan-phuc
Copy link
Author

I was figuring out a solution to make .NET Framework-based app work, but not sure if this is a bug or anything. Let me detail the process so that someone facing the same issue knows how to get through it.
The original problem was:

  1. I created a new c# winform app using .NET Framework 4.8, and set it to work with x64 only.
  2. Installed nuget packages: TensorFlow.NET, SciSharp.TensorFlow.Redist, and TensorFlow.Keras. To successfully install TensorFlow.Keras, I needed to install the PureHDF separately first (by checking the Include prerelease).
  3. Copy the example from the main SciSharp github page
  4. Copy the tensorflow.dll from SciSharp.TensorFlow.Redist.2.16.0 package into the debug folder (to clear the backend not found exception)
  5. Got the System.InvalidOperationException: 'Collection was modified; enumeration operation may not execute.' at the model.save

How did I get it work:

  1. Download the entire TensorFlow.Net repo -> create a new .NET Framework 4.8 project within the TensorFlow.NET solution.
  2. Installed nuget packages: TensorFlow.NET, SciSharp.TensorFlow.Redist, and TensorFlow.Keras. This is to get all the dependencies to be installed. After that, uninstall TensorFlow.NET and TensorFlow.Keras.
  3. Copy the tensorflow.dll into the debug folder
  4. In my project, add a reference to the Tensorflow.Binding and Tensorflow.Keras from the repo.
  5. The application works smoothly now without error.

When I checked the output debug folder, I noticed the size difference in the Tensorflow.Binding.dll and Tensorflow.Keras.dll between the original and the solution. Copying these two files from the updated solution folder to the previous project folder solve the error too. Thus, I guess there should be some differences in the Tensorflow.Binding.dll and Tensorflow.Keras.dll between the release NuGet packages and the repo. Do you have any idea on this @AsakusaRinne?

@Oceania2018
Copy link
Member

It might be the NuGut package is not up to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants