-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRON-2210: Add a tool that can sync from pod state to tron w/ tronctl #985
Conversation
job_runs = client.job( | ||
url, | ||
include_action_runs=True, | ||
count=num_runs, # TODO: fetch job run_limit and use that for count ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a bad idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we don't actually include run_limit in job/job run API responses :/ so this would need to pull in from other sources (either config api or reading the configs some other way), so I think I'm going to leave it as is for now (we can always run w/ a higher num_run if we need to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, we can probably read from /nail/etc/services - but it's probably fine to leave as-is, i can't think of anything getting anywhere near their run_limit
in terms of concurrently running jobs
tools/sync_tron_state_from_k8s.py
Outdated
) | ||
|
||
for job in jobs: | ||
# What am I doing wrong here, why do I have to append /api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not doing anything wrong - i'm not quite sure why this method doesn't automatically add in /api for us
56291a3
to
64db0ab
Compare
Co-authored-by: Luis Pérez <[email protected]>
This adds a pretty simple script that will:
tronctl
if the pod & tron's states do not match.Tron will only allow us to transition from a few known states, so this script only allows using
tronctl fail
ortronctl succeed
if the tron action run was either lost/unknown or 'stuck' atrunning
orstarting
.Right now, I allow passing both a tron-url + tronctl-wrapper so that this could be run from outside the tron server itself, but defaults assume we'll be running on a local tron server.
Still adding some tests but thought I should get feedback now in case I need to shift gears.
Validation
Currently, one reliable way to get an actual tron action run into an
unknown
state is to restart tron w/in a short period of the pod having been started.Forcing a few of these cases in tron-spark-pnw-devc, the script correctly updates the 'unknown' action runs' states: