Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(restartStuckPod):Restart stuck pods #413

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

vimystic
Copy link
Contributor

@vimystic vimystic commented Apr 2, 2024

No description provided.

timeInLog, err := extractTimeFromLog(receivedString)
if err != nil {
fmt.Println("Error parsing time from log:", err)
return true
Copy link
Contributor

@pharr117 pharr117 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not want to return true here, if the time parse fails this will return true from isPodStuck, which I assume would kill the pod inadvertantly.

Can we change this function to return (bool, error) so we can track the errors better?

func extractTimeFromLog(log string) (time.Time, error) {
parts := strings.Fields(log)

const timeLayout = "3:04PM"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to run the time parse on a few layout strings, for example I am looking at an Axelar Testnet sentry node running and it has this time string:

2024-04-17T17:37:18Z

return parsedTime, nil
}

func getPodLogsLastLine(clientset *kubernetes.Clientset, pod *corev1.Pod) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to change this to return (string, error) so that we can track the errors better

@vimystic
Copy link
Contributor Author

vimystic commented Nov 1, 2024

Wip : to rework this to accomodate other time strings.

@nourspace
Copy link
Member

Speculation: Nodes on SDK 50 are not having this issue.
Keeping this on observation whether we actually need this feature or not

@vimystic
Copy link
Contributor Author

vimystic commented Nov 7, 2024

Update :
So far the only sentries that get stuck are nibiru , stargaze , saga. (They are all pre sdk50).
Speculation still appears to be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants