-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak (core dumped) problem in tfjs-node #8312
Comments
@gaikwadrahul8 Hi , gaikwadrahul8 It seems that I've found a solution. After examining the core dump, I suspected it was related to the oneDNN source. So, I explicitly enabled the option by setting TF_ENABLE_ONEDNN_OPTS=1. As a result, I saw a log message I hadn't encountered before: "oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0." Since then, the application has not crashed. Although I haven't fully understood the exact reason, I would appreciate your thoughts on this |
Hi, @Hyodori04 I apologize for the delayed response and good to hear that your application is not crashing after enabling the The message warns that enabling oneDNN optimizations can lead to slightly different numerical results compared to TensorFlow's default CPU implementation or other libraries. This is due to variations in computation order and floating-point round-off errors that may occur as a result of how oneDNN optimizes and parallelizes computations. Disabling oneDNN optimizations ( For memory Leak Diagnosis :
Double-check your code for any tensors created outside the tf.tidy block or during intermediate computations within the Thank you for your cooperation and patience. |
I have already checked memory Leak Diagnosis and tf.tidy. I think it's kind of bug that if not using Onednn tf is crashed because onednn is for optimiztion. I want to know what part of code make crash when not using onednn but it's not easy for me. Maybe later you guys or me can confirm wrong code |
System information
Describe the current behavior
We serve our service in docker node.
If there are several sequential requests that use model.predict, our node server is killed
I think there is some kind of memory leak because error logs are like
And docker metrics have similar memory size
Aborted (core dumped)
Describe the expected behavior
Memory leak error doesn't happen
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.
Other info / logs
lldb trace
The text was updated successfully, but these errors were encountered: