You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using onnxruntime's python module in google/magika, and we just noticed that the inference results can vary significantly (beyond rounding errors) between windows and linux/macos machines. It seems this has always been the case, but we just noticed as the results on a given input changed so much that it caused a misclassification on our test set.
I've created a proof of concept via github action runners (details below).
In essence, you can see how the inference result (using the same code/model, but on different OSes) leads to a very different prediction score.
The inference score on the different platforms is:
ubuntu: 0.75302654504776
macos: 0.7530270218849182
windows: 0.928835928440094
As you can see, the windows' one is different beyond what rounding errors can explain (linux vs. macos are fine). We checked the results on many files with linux vs. macos: they are all the same (within 0.x% rounding errors). The problem seems very specific to Windows.
Note that we also have client written in rust (thus, it does not use the onnxruntime python module) and we see the same very significant discrepancies (the inference scores of the python module seems to match the ones from the rust client).
It seems that the input blob data differs between Windows and Linux.
While waiting for assistance from experts, please verify whether the libraries and their versions are consistent across Windows and Linux/MacOS. Additionally, check if the input data before being fed into ONNX Runtime is identical.
I also noticed a warning in your GitHub Actions logs as follows:
UserWarning: Unsupported Windows version (2022server). ONNX Runtime supports Windows 10 and above, only.
About the difference in input: there is a chance you are right... The feature extraction code is so simple that a bug there didn't even cross my mind. Investigating and will report back...
I believe I found the bug, and it has nothing to do with onnxruntime (nor with magika's features extraction code): it turns out that, on windows, git's checkout automatically converts "\n" to "\r\n", leading to different extracted features vs. linux/macos. I'm pretty sure this is it. Closing the bug for now, will re-open if the issue does not go away.
Thanks for the quick reply and for the great project!
Describe the issue
Hello,
We are using onnxruntime's python module in google/magika, and we just noticed that the inference results can vary significantly (beyond rounding errors) between windows and linux/macos machines. It seems this has always been the case, but we just noticed as the results on a given input changed so much that it caused a misclassification on our test set.
/cc @ia0 @invernizzi
To reproduce
I've created a proof of concept via github action runners (details below).
In essence, you can see how the inference result (using the same code/model, but on different OSes) leads to a very different prediction score.
The inference score on the different platforms is:
As you can see, the windows' one is different beyond what rounding errors can explain (linux vs. macos are fine). We checked the results on many files with linux vs. macos: they are all the same (within 0.x% rounding errors). The problem seems very specific to Windows.
Note that we also have client written in rust (thus, it does not use the onnxruntime python module) and we see the same very significant discrepancies (the inference scores of the python module seems to match the ones from the rust client).
Details to reproduce:
Urgency
Due to this bug, we need to halt the release of the magika's windows packages, as it seems too unpredictable.
Platform
Windows
OS Version
Windows Server 2022 (github's windows-latest runner)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-1.19.2-cp312-cp312-win_amd64.whl
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: