Investigate poll and cache for sensor values instead of on-demand reads #9

williamspatrick · 2017-06-26T18:49:22Z

@nasamuffin commented on Mon Jun 26 2017

With both the aspeed fantach sensors and the 1-wire temperature sensors, we encountered long sensor read times leading to timeouts all the way up in btbridge, causing hard-to-diagnose failures from the host side IPMI handling.

These sensors may not be the only very slow sensors we run across. An arbitrary 5-second timeout in btbridge may eventually prove too short for another sensor. And reads appear very slow to the host when they may not need to be.

Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.

(host-ipmid maybe isn't the right place for this, but it's the area that's being affected with sneaky failures, so it seemed like as good a place as any.)

@williamspatrick commented on Mon Jun 26 2017

Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.

rlippert · 2017-06-26T19:09:12Z

Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.

The 1-wire thermal sensor takes 1 second to perform a measurement because it is very precise. There is no way to make it faster except by losing precision.

Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.

Polling this sensor all the time is a bad idea because it is expensive to read (the 1-wire interface is software bitbanged on Aspeed parts).

Additionally there is no way in the IPMI interface to indicate to the host when the reading was last valid so it would introduce a time-uncertainty into every reading.

bradbishop · 2017-06-27T03:28:10Z

Thats a pretty tight set of constraints. Can anyone think of an option besides making the bt timeout configurable somehow?

williamspatrick mentioned this issue Jun 26, 2017

Investigate poll and cache for sensor values instead of on-demand reads openbmc/phosphor-host-ipmid#108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate poll and cache for sensor values instead of on-demand reads #9

Investigate poll and cache for sensor values instead of on-demand reads #9

williamspatrick commented Jun 26, 2017

rlippert commented Jun 26, 2017

bradbishop commented Jun 27, 2017

Investigate poll and cache for sensor values instead of on-demand reads #9

Investigate poll and cache for sensor values instead of on-demand reads #9

Comments

williamspatrick commented Jun 26, 2017

rlippert commented Jun 26, 2017

bradbishop commented Jun 27, 2017