Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: clarify conversions can be impacted by double-rounding #2361

Merged
merged 1 commit into from
Jan 14, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions doc/programming_model/data_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ post-ops). The following formula governs the datatypes dynamic during
a primitive computation:

\f[
\operatorname{convert_{dst\_dt}} ( \operatorname{dst\_zero\_point_{f32}} + \operatorname{postops_{f32}} (\operatorname{oscale_{f32}} * \operatorname{convert_{f32}} (\operatorname{Op}(\operatorname{src_{src\_dt}}, \operatorname{weights_{wei\_dt}}, ...))))
\operatorname{convert_{dst\_dt}} ( \operatorname{zp_{dst}} + 1/\operatorname{scale_{dst}} * \operatorname{postops_{f32}} (\operatorname{convert_{f32}} (\operatorname{Op}(\operatorname{src_{src\_dt}}, \operatorname{weights_{wei\_dt}}, ...))))
\f]

The `Op` output datatype depends on the datatype of its inputs:
Expand All @@ -99,7 +99,15 @@ No downconversions are allowed by default, but can be enabled using
the floating-point math controls described in @ref
dev_guide_attributes_fpmath_mode.


The \f$convert_{dst\_dt}\f$ conversion is guaranteed to be faithfully
rounded but not guaranteed to be correctly rounded (the returned value
is not always the closest one but one of the two closest representable
value). In particular, some hardware platforms have no direct
conversion instructions from f32 data type to low-precision data types
such as fp8 or fp4, and will perform conversion through an
intermediate data type (for example f16 or bf16), which may result in
[double
rounding](https://en.wikipedia.org/wiki/Rounding#Double_rounding).

### Rounding mode and denormal handling
vpirogov marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
Loading