You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With both the aspeed fantach sensors and the 1-wire temperature sensors, we encountered long sensor read times leading to timeouts all the way up in btbridge, causing hard-to-diagnose failures from the host side IPMI handling.
These sensors may not be the only very slow sensors we run across. An arbitrary 5-second timeout in btbridge may eventually prove too short for another sensor. And reads appear very slow to the host when they may not need to be.
Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.
(host-ipmid maybe isn't the right place for this, but it's the area that's being affected with sneaky failures, so it seemed like as good a place as any.)
Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.
The text was updated successfully, but these errors were encountered:
Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.
The 1-wire thermal sensor takes 1 second to perform a measurement because it is very precise. There is no way to make it faster except by losing precision.
Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.
Polling this sensor all the time is a bad idea because it is expensive to read (the 1-wire interface is software bitbanged on Aspeed parts).
Additionally there is no way in the IPMI interface to indicate to the host when the reading was last valid so it would introduce a time-uncertainty into every reading.
@nasamuffin commented on Mon Jun 26 2017
With both the aspeed fantach sensors and the 1-wire temperature sensors, we encountered long sensor read times leading to timeouts all the way up in btbridge, causing hard-to-diagnose failures from the host side IPMI handling.
These sensors may not be the only very slow sensors we run across. An arbitrary 5-second timeout in btbridge may eventually prove too short for another sensor. And reads appear very slow to the host when they may not need to be.
Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.
(host-ipmid maybe isn't the right place for this, but it's the area that's being affected with sneaky failures, so it seemed like as good a place as any.)
@williamspatrick commented on Mon Jun 26 2017
Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.
The text was updated successfully, but these errors were encountered: