-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nzxt-kraken3: PWM setting fixes; status reporting optimization #34
nzxt-kraken3: PWM setting fixes; status reporting optimization #34
Conversation
Thanks a lot! I will check today or tomorrow! |
nzxt-kraken3.c
Outdated
*/ | ||
spin_lock_bh(&pwm_write_lock); | ||
priv->channel_info[channel].reported_duty = val; | ||
spin_unlock_bh(&pwm_write_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any other place where this spin lock is used.
Isn't the assignment atomic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, might be. Will investigate later, this was a first shot at resolving this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, reported_duty
is u16 it should be trivial to do as atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand this spinlock: unless it's also taken when reading, it doesn't do much.
@amezin ,
Regarding 16-bit stores being atomic: they aren't, at least not in the context of either the C or the Linux memory models (well, the one documented in tools/memory-model/Documentation/
). By those definitions, either store to reported_duty
can cause a data race with each other or the load in kraken3_read
.
(For completeness: in the context of the CPU, 16-bit stores might be compiled into a single/simple store, in which case they would be atomic on x86. Same thing for the load, and looking at the surrounding code, this is probably the case here. And I suppose there is almost no one connecting these coolers to an ARM machine, even if just for external monitoring of a x86 system).
But it seems that, at least in the HWMON subsystem (and, to be fair, other parts of the kernel), plain accesses can be used even if they theoretically can cause data races, as long as the compiler (currently) produces sensible code (reminder: a data race is UB), and that the impact of a perhaps stale or garbage value isn't relevant or can't be observed in practice.
I say this based on a discussion I had with Guenter on the mailing list. There's a bit of noise due to some communication issues, but you can find the entire thread here.
TL;DR: @aleksamagicka, removing the spinlock and just keeping the plain store and load is probably fine (like we already do for the rest of the structure). But please test it well.
(Foreshadowing: this is fine meme).
Yes, making the store and load atomic, or even just "marked" (i.e. using the kernel's READ_ONCE
/WRITE_ONCE
macros, which essentially do volatile accesses), would also be possible.
Still, I think the (usually) cheaper plain accesses are preferred if no issues can be observed when using them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spinlock was me copying the nzxt-smart code, thinking about it it probably does not do anything here, I'll test (although I'm no expert on kernel concurrency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, judging from Guenter's emails (thanks for the link!), it's not terribly important for this to be synchronized because it's for display/info only. (Do correct me if I got that wrong.)
That's more or less what I understood from his comments too.
The spinlock might not have done much locking, but as the function names suggest it does disable bottom half routines - it remains to be seen/tested if that applies to kraken3_raw_event() as the actual processing function for interrupts.
In the past I remember looking into on which context the raw event handler runs.
Assuming this old comment of mine is correct, the handler runs in a hard IRQ. And IIRC, that means that spin_lock_irq[save]
would be necessary to disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonasmalacofilho I know, I worked with atomic a lot, especially for PowerPC. Just in case, spinlock internally works based on an atomic operation (CAS or TAS). It means if possible using atomics directly it will not be so costly as spinlock in any case. But as Guenter is mention - in such a case we shouldn't worry about guarding reported_duty
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonasmalacofilho Ok, I should've said "effectively atomic". I. e. another thread will read either the old value or new one, not garbage. Even on ARM. Not in generic C, but in the Linux kernel with supported compilers and flags. Or is this incorrect too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even on ARM.
You're right, thanks for the correction. (I had previously implied that a simple 16-bit load/store would only be atomic, at the CPU level, on x86).
Not in generic C, but in the Linux kernel with supported compilers and flags. Or is this incorrect too?
According to the LKMM:1
if it [the program] does [contain race candidates which may execute concurrently] then the LKMM says there is a potential data race and makes no predictions about the program's outcome.
Regarding compiler flags used by the kernel to make compilers work more conservatively, I'm only aware of a few of them, related to other types of optimizations. So I can't answer that part.
By the way, there's an interesting (and reasonably recent) article on LWM about this: Who's afraid of a big bad optimizing compiler?
P.S. I have the impression that there are two groups within the kernel: some tend to approach these problems from the point of view of what the compiler is allowed to do; some from the point of view of whether stricter accesses or synchronization can be shown to be necessary.
P.P.S. Personally, I find performance arguments against READ_ONCE
/WRITE_ONCE
not compelling when they are a viable way to prevent a clear potential data race: if those marked accesses prevent optimizations, that usually suggests that they are indeed necessary; and if they don't, they are free now, and a safeguard against future compiler versions producing different and potentially broken code.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonasmalacofilho Thanks, if it runs in hard IRQ then those spinlock bottom half calls really do nothing.
It seems like working as expected, I also have no any significant issues with fan2go. Thanks again! |
nzxt-kraken3.c
Outdated
*/ | ||
spin_lock_bh(&pwm_write_lock); | ||
priv->channel_info[channel].reported_duty = val; | ||
spin_unlock_bh(&pwm_write_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand this spinlock: unless it's also taken when reading, it doesn't do much.
@amezin ,
Regarding 16-bit stores being atomic: they aren't, at least not in the context of either the C or the Linux memory models (well, the one documented in tools/memory-model/Documentation/
). By those definitions, either store to reported_duty
can cause a data race with each other or the load in kraken3_read
.
(For completeness: in the context of the CPU, 16-bit stores might be compiled into a single/simple store, in which case they would be atomic on x86. Same thing for the load, and looking at the surrounding code, this is probably the case here. And I suppose there is almost no one connecting these coolers to an ARM machine, even if just for external monitoring of a x86 system).
But it seems that, at least in the HWMON subsystem (and, to be fair, other parts of the kernel), plain accesses can be used even if they theoretically can cause data races, as long as the compiler (currently) produces sensible code (reminder: a data race is UB), and that the impact of a perhaps stale or garbage value isn't relevant or can't be observed in practice.
I say this based on a discussion I had with Guenter on the mailing list. There's a bit of noise due to some communication issues, but you can find the entire thread here.
TL;DR: @aleksamagicka, removing the spinlock and just keeping the plain store and load is probably fine (like we already do for the rest of the structure). But please test it well.
(Foreshadowing: this is fine meme).
Yes, making the store and load atomic, or even just "marked" (i.e. using the kernel's READ_ONCE
/WRITE_ONCE
macros, which essentially do volatile accesses), would also be possible.
Still, I think the (usually) cheaper plain accesses are preferred if no issues can be observed when using them.
@aleksamagicka @stalkerg By the way, have you tested the driver with lm-sensors (i. e. pwmconfig+fancontrol)? These utilities sometimes fail in surprising ways. As far as I remember, they treat almost any sysfs read error as "oh the device is completely broken let's die". |
@amezin I didn't test it in fancontrol but fan2go has totally the same behavior for read error (and it's ok). |
This should resolve the issues noted in #7 and optimize Z series status reporting.
PUMP_DUTY_MIN
will now be silently set to it without returning error codespriv->duty_input
in favor of recording the value inkraken3_channel_info
for each channelSeems to work when using
fan2go
, here's my config: fan2go.txt. I had to setmaxRpmDiffForSettledFan: 20
because X53 varied between 14-20 as far as I've seen, and the default value is 10. Didn't yet test with pwmconfig and fancontrol.