Getting very different inference results on windows vs. linux/macos #23443

reyammer · 2025-01-21T17:26:48Z

Describe the issue

Hello,

We are using onnxruntime's python module in google/magika, and we just noticed that the inference results can vary significantly (beyond rounding errors) between windows and linux/macos machines. It seems this has always been the case, but we just noticed as the results on a given input changed so much that it caused a misclassification on our test set.

/cc @ia0 @invernizzi

To reproduce

I've created a proof of concept via github action runners (details below).

In essence, you can see how the inference result (using the same code/model, but on different OSes) leads to a very different prediction score.

The inference score on the different platforms is:

ubuntu: 0.75302654504776
macos: 0.7530270218849182
windows: 0.928835928440094

As you can see, the windows' one is different beyond what rounding errors can explain (linux vs. macos are fine). We checked the results on many files with linux vs. macos: they are all the same (within 0.x% rounding errors). The problem seems very specific to Windows.

Note that we also have client written in rust (thus, it does not use the onnxruntime python module) and we see the same very significant discrepancies (the inference scores of the python module seems to match the ones from the rust client).

Details to reproduce:

github workflow file (useful if you want to reproduce this): https://github.com/google/magika/actions/runs/12892093308/workflow
github action runs (see on the left, there are runs for ubuntu, macos, and windows): https://github.com/google/magika/actions/runs/12892093308/job/35945544456

Urgency

Due to this bug, we need to halt the release of the magika's windows packages, as it seems too unpredictable.

Platform

Windows

OS Version

Windows Server 2022 (github's windows-latest runner)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-1.19.2-cp312-cp312-win_amd64.whl

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

eKevinHoang · 2025-01-21T18:00:53Z

It seems that the input blob data differs between Windows and Linux.

While waiting for assistance from experts, please verify whether the libraries and their versions are consistent across Windows and Linux/MacOS. Additionally, check if the input data before being fed into ONNX Runtime is identical.

I also noticed a warning in your GitHub Actions logs as follows:
UserWarning: Unsupported Windows version (2022server). ONNX Runtime supports Windows 10 and above, only.

Are you running on Windows Server 2022?

reyammer · 2025-01-21T20:59:27Z

About the difference in input: there is a chance you are right... The feature extraction code is so simple that a bug there didn't even cross my mind. Investigating and will report back...

reyammer · 2025-01-21T22:36:05Z

I believe I found the bug, and it has nothing to do with onnxruntime (nor with magika's features extraction code): it turns out that, on windows, git's checkout automatically converts "\n" to "\r\n", leading to different extracted features vs. linux/macos. I'm pretty sure this is it. Closing the bug for now, will re-open if the issue does not go away.

Thanks for the quick reply and for the great project!

reyammer mentioned this issue Jan 21, 2025

Windows' onnxruntime/ort returns different results wrt linux and mac google/magika#892

Closed

github-actions bot added the platform:windows issues related to the Windows platform label Jan 21, 2025

reyammer closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting very different inference results on windows vs. linux/macos #23443

Getting very different inference results on windows vs. linux/macos #23443

reyammer commented Jan 21, 2025

eKevinHoang commented Jan 21, 2025 •

edited

Loading

reyammer commented Jan 21, 2025

reyammer commented Jan 21, 2025

Getting very different inference results on windows vs. linux/macos #23443

Getting very different inference results on windows vs. linux/macos #23443

Comments

reyammer commented Jan 21, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

eKevinHoang commented Jan 21, 2025 • edited Loading

reyammer commented Jan 21, 2025

reyammer commented Jan 21, 2025

eKevinHoang commented Jan 21, 2025 •

edited

Loading