Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate poll and cache for sensor values instead of on-demand reads #9

Open
williamspatrick opened this issue Jun 26, 2017 · 2 comments

Comments

@williamspatrick
Copy link
Member

@nasamuffin commented on Mon Jun 26 2017

With both the aspeed fantach sensors and the 1-wire temperature sensors, we encountered long sensor read times leading to timeouts all the way up in btbridge, causing hard-to-diagnose failures from the host side IPMI handling.

These sensors may not be the only very slow sensors we run across. An arbitrary 5-second timeout in btbridge may eventually prove too short for another sensor. And reads appear very slow to the host when they may not need to be.

Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.

(host-ipmid maybe isn't the right place for this, but it's the area that's being affected with sneaky failures, so it seemed like as good a place as any.)


@williamspatrick commented on Mon Jun 26 2017

Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.

@rlippert
Copy link

Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.

The 1-wire thermal sensor takes 1 second to perform a measurement because it is very precise. There is no way to make it faster except by losing precision.

Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.

Polling this sensor all the time is a bad idea because it is expensive to read (the 1-wire interface is software bitbanged on Aspeed parts).

Additionally there is no way in the IPMI interface to indicate to the host when the reading was last valid so it would introduce a time-uncertainty into every reading.

@bradbishop
Copy link
Member

Thats a pretty tight set of constraints. Can anyone think of an option besides making the bt timeout configurable somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants