This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
Ongoing transfers after a job is killed #920
Labels
Comp: Burst Buffer
PhaseFound: Customer
Sev: 3
Status: Open
open for someone to grab and start working on
Type: Documentation
Question:
Some questions that came up while testing the BB API:
Q1:
Let's say I start a job that initiates a big 1TB transfer between the SSD and GPFS. Then the job is killed off. If I knew the transfer handle (lets say I saved it), could I cancel the transfer even though I had a new job ID?
If the answer is yes:
What would happen if I was able to guess the transfer handle of another user? Could I cancel their transfers as well? Or is that not allowed?
If the answer is no:
How would I kill off the my old transfer? Assume I'm the same user and don't have root.
Q2:
What happens if User1 requests the enitre burst buffer on a node, fills it with a single file, starts transferring the file from the burst buffer to GPFS, and then hits a segfault (so the job dies, but the transfer is still going). Then, User2 gets assigned the same node and burst buffer, and does the same thing. That is, User2 writes a single file to the entire burst buffer, overwriting User1's extents. Does User1's ongoing "zombie transfer" then get corrupted with User2's new data? Or is User1's BB transfer automatically cancelled when a new user gets their node?
Answer:
place final answer here
Approach:
here replace this with a short summary of how you addressed the problem. in the comments place step by step notes of progress as you go
What is next:
Define the next steps and follow up here
The text was updated successfully, but these errors were encountered: