Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering authenticated workflows #232

Open
DiamondJoseph opened this issue Oct 21, 2024 · 2 comments
Open

Triggering authenticated workflows #232

DiamondJoseph opened this issue Oct 21, 2024 · 2 comments

Comments

@DiamondJoseph
Copy link

DiamondJoseph commented Oct 21, 2024

Assuming:

  • There is a service called blueapi, which requires an authenticated request and creates "raw data"
  • There is a process called analysis, which generically consumes "raw data" and creates "processed data"
  • A user makes a request to blueapi to create "raw data" and knows they want a specific form of analysis to produce "processed data" either while blueapi is acting or afterwards.
  • To leverage the workflow system, the user should not need to manually create the analysis instance
  • The analysis instance should write data to the same visit as the raw data and request that spawned it
  • The analysis should be authorized to read only the raw data that it requires
sequenceDiagram
    actor Alice
    Note left of Alice: my_scan uses my_analysis
    Alice ->> +blueapi: run my_scan, visit=a1
    Note over Alice,blueapi: scope read data visit=a1
    Note over Alice,blueapi: scope write data visit=a1
    Note over Alice,blueapi: scope run my_analysis visit=a1

    participant raw as Raw Data Store<br>[via DataAPI]
    blueapi ->> raw: StartDocument runid=a1-1
    Note over blueapi,raw: AuthZ'd to write

    participant manager as Workflow Manager
    blueapi ->> manager: start my_analysis visit=a1 runid=a1-1
    Note over blueapi,manager: AuthZ'd to run
    
    create participant Analysis as my_analysis
    manager ->> +Analysis: creates
    Note over manager,Analysis: scope read data visit=a1
    Note over manager,Analysis: scope write data visit=a1
    
    opt Live Analysis
    Analysis ->> raw: fetch data so far
    raw ->> Analysis: 
    Note over Analysis,raw: AuthZ'd to read
    Analysis ->> processed: processed data
    Note over Analysis,processed: AuthZ'd to write

    loop until scan over
    blueapi ->> raw: Documents
    Analysis -->> raw: poll for new data
    Analysis ->> processed: processed data
    end
    blueapi ->> raw: StopDocument
    Analysis -->> raw: poll for new data
    Analysis ->> processed: processed data
    end
    opt Post Processing
    blueapi ->> raw: Documents
    blueapi ->> -raw: StopDocument
    Analysis ->> raw: fetch all data
    raw ->> Analysis:     
    Note over raw,Analysis: AuthZ'd to read
    end
    deactivate Analysis

    participant processed as Processed Data Store<br>[via DataAPI]
    destroy Analysis
    Analysis ->> processed: processed data
    Note over Analysis,processed: AuthZ'd to write

    Alice ->> raw: 
    raw ->> Alice: 
    Note over Alice,raw: AuthZ'd to read
    Alice ->> processed: 
    processed ->> Alice: 
    Note over Alice,processed: AuthZ'd to read
Loading
@DiamondJoseph
Copy link
Author

DiamondJoseph commented Oct 21, 2024

@callumforrester has thoughts about whether the live/at-rest processing should look the same or not:

DISCLAIMER: I'm not in data analysis and my knowledge may be out of date

this looks the same regardless of if it's post or live analysis.

Not quite, for post processing the code can and should be considerably simpler, no need to go through the data as if it is being streamed when it isn't, you want the code to say something like:

detector_data = data_api.get("saxs")[:]
return np.average(detector_data, axis=0)

This is especially important because (I believe) that most of our use cases are still for post processing rather than live processing, so we shouldn't introduce unnecessary complexity to the majority use case.

@DiamondJoseph
Copy link
Author

See bluesky/tiled#437 for Tiled implementation of the DataAPI informing the client that more information is available for consumption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant