-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wordpress container restarts #138
Comments
Based on the log, it appears that the |
I believe the plugin is owned by the web and design team, and I've started a thread here: https://chat.canonical.com/canonical/pl/k7quc4dqipnhjgstu5xkz1mmty I think I've taken this as far as I reasonably can: IS doesn't own the deployment, or the site being deployed, and the cloud and the k8s cluster, which we do own, appear to be working correctly. I've silenced this for a week in AlertManager so there's no rush to progress this from my perspective. |
Thanks, we will follow up with the web team if this happens again. |
FYI, we have this alert constantly. We limited the alert to be fired only when a restart happens 3 times under 10 minutes (which means that the liveness probe failed enough to trigger 3 restarts). It happened twice today, with similar log to what was described in the original bug. I can see the liveness check is rather "aggressive":
Indeed, the charm pebble layer check 0 results in this
This is probably a bit too aggressive. Also, the "threshold" setting is misleading, it's apparently only affecting the "successThreshold" and not the "failureThreshold" (undefined above) that defaults to 3. I'm going to silence till Friday so you have time to look at it. |
The health checks in the k8s charms are controlled by Pebble, and the check parameters on the Kubernetes side are actually for the Pebble server. Therefore, the small failed threshold and timeout seconds are meant for Pebble health API requests, instead of WordPress health check requests. The actual health check parameters for WordPress are defined as you mentioned here, with a timeout of 5 seconds and a failure threshold of 3 (default). Do you have any monitoring information that you can share with us? For example, there's a request duration Prometheus metric which can indicate if the WordPress server is running slowly, and perhaps any WordPress Apache logs related to the failure in Loki? |
Here is an extract of a failure that happened today and the relevant apache logs around it:
|
@weiiwang01 can you follow-up on this and/or close the issue please? |
i believe this has already been addressed in higher revisions of the i will close this for now; please reopen the issue if there are other problems after the upgrade. |
Bug Description
We get frequent alerts due to the wordpress container restarting. This was addressed by #135 and an upgrade to r46 of the charm, but either this didn't solve the problem or a new one has arisen.
To Reproduce
Deploy the charm.
Environment
prod-is-external-kubernetes@is-bastion-ps5
Relevant log output
Additional context
No response
The text was updated successfully, but these errors were encountered: