Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial structure for out of the application workflow orchestration #5

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

koparasy
Copy link
Member

No description provided.

milroy and others added 3 commits August 29, 2023 08:41
* Add wrapper to start flux with proxy
* Add infrastructure of connecting to and receiving from RMQ
* Add initial daemon implementation

import flux
import json
from flux.security import SecurityContext
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milroy I get an import error. I am guessing because of running on an old flux version. Can you make the appropriate changes to support both versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milroy This is a general 'driver' of the code we are going to run outside of compute nodes. I re-used and restructured your previous commit and tried to abstract out some of the hard-coded paths.

import argparse


def main():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milroy I have not tested any of this code. Take a look and let me know of any considerations. Effectively I bundle both the AMSDaemon and the FluxDaemon at the same python entry point.

import json


class RMQClient:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpottier This is a first re-implementation of the RMQClient we had on our previous hands-on meeting. Connecting with the server needs to happen with a context manager python API. The same applies for enabling/pulling messages from channels. This is inspired from the orchestrator requirements, but use it as a baseline to support the database client. Ideally, we would like to have a single API-point of entry to communicating messages across AMS components.

lpottier and others added 5 commits September 5, 2023 21:36
…messages to a RabbitMQ queue

Signed-off-by: Loic Pottier <[email protected]>
* Added initial script to bootstrap Flux on CORAL/IBM machine
* Added support Slurm based system, tested with flux-core 0.49 on Ruby/Lassen
* Added script to launch AMS miniapp with Flux
* Reverted script to support older version of flux, Lassen bootstrap only works wityh Flux<= 0.45 (tested with 0.45)
* Added scripts to add secrets on OC
* Added new scripts to launch the entire AMS workflow
* Upgrade all scripts, they are now fully functional (main script communicates with AMS daemon via RMQ)

---------

Signed-off-by: Loic Pottier <[email protected]>
* This commit addresses multiple problems in the broker part of AMS
- we are not sending input/output as encoded string anymore, we send binary blobs
- base64 has been removed
- a bug has been fixed with (very) old libevent version (<= 2.0.21-stable)
- offloading inputs/outputs to the thread managing RMQ is now much faster

* Moved to ResourceManager, created AMSMessage structures, moved to smart pointers.

* Complete re-design of the RabbitMQ backend

* Removed EventBuffer, removed pthread and signals. Big cleanup of the code.

* Added documentation and new AMSMsgHeader class + moved from memcpy to ResourceManager::copy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants