Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-list: access user protected data from job-info #5120

Open
chu11 opened this issue Apr 26, 2023 · 2 comments
Open

job-list: access user protected data from job-info #5120

chu11 opened this issue Apr 26, 2023 · 2 comments

Comments

@chu11
Copy link
Member

chu11 commented Apr 26, 2023

Just brainstorming here a bit. The conversation in flux-framework/flux-docs#229 made me realize it sort of sucks that data has to be gathered from two locations sometimes. Seems to be the case from slurm days too.

It is sort of a historical side effect of the fact there is data for everyone to see (jobids, number of nodes, etc.) which eventually gets listed in tools like flux-jobs and data that is not for everyone to see (full commandline of job, jobspec, etc.) that is only retrievable by individuals that specifically want that data.

Because of this historical split, some information simply has never been available in job-list (flux jobs), you have to go to job-info to get it (flux job info).

For example, jobspec and R are read / cached in job-list, so it could conceptually be offered to callers if there was an access control mechanism. The same access controls that are done in job-info could possibly be copied into job-list, allowing users to access that not-for-everyone data.

I'm not sure what side effects there could be for this. Off the top of my head.

  • what to display / return when there's data that the user isn't allow to retrieve

  • we wouldn't want users to output data in flux-jobs and then cut and paste to some communication (i.e. slack) that they shouldn't, so maybe this isn't a wise idea. But this could be controlled by just not supporting this output in flux-jobs, it is only available via API or something.

alternately if the "everything goes into a database" that is someday done, everything could be redirected to the database with appropriate controls (see conversation #4914)

@grondo
Copy link
Contributor

grondo commented Apr 26, 2023

It doesn't seem like we'd want another copy of everything in the job-list module. At least when sitting in the KVS the content doesn't have to reside in memory for all time.

I'm not sure it is a problem that some detailed information (like the entire submitted environment or job script) has be obtained by fetching the jobspec directly. You'd also have to cache the signed J in case the user wanted to verify the data had not been modifed after submission, plus there is the redacted jobspec which the instance has indeed modified for its own use...

@chu11
Copy link
Member Author

chu11 commented Apr 26, 2023

It doesn't seem like we'd want another copy of everything in the job-list module. At least when sitting in the KVS the content doesn't have to reside in memory for all time.

Yeah, we don't need everything. I guess I was specifically thinking of additional data in R / jobspec (or eventlog), because that's already read into job-list anyways, and (I'd have to verify) I think is already cached in job-list.

And whatever isn't cached, it is in #4336 b/c at some point in time it has to be dumped to sqlite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants