Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify preprocessing functions #1027

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

mmccrackan
Copy link
Contributor

This branch moves around some of the preprocessing functions into a common preprocessing/preprocess_util.py file. The ones moved specifically are:

Some of these aren't shared among multiple files yet but they will be when #1026 is merged. Note that the inputs of load_and_preprocess are slightly modified from load_preprocess_tod/load_preprocess_obs. It now accepts no_signal which defaults to None and the dets and meta inputs not in load_preprocess_obs are now added. The default config names were removed, though we could re-add them based on the value of no_signal perhaps.

Some additional files have also been moved into site_pipeline/util.py:

  • swap_archive
  • get_preprocess_db (edited to accept an optional logger as input, creates one if not given)

Copy link
Contributor

@davidvng davidvng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mmccrackan, some comments on things that are also done in #1014 which has not been merged yet. How should we rectify this? @msilvafe

@@ -12,35 +12,6 @@

logger = sp_util.init_logger("preprocess")

def _get_preprocess_context(configs, context=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal is also done in #1014

@@ -56,265 +57,6 @@ def dummy_preproc(obs_id, group_list, logger,
if run_parallel:
return error, dest_file, outputs

def _get_preprocess_context(configs, context=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal is also done in #1014

)
return configs, context

def _get_groups(obs_id, configs, context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removal is also done in #1014

@mmccrackan
Copy link
Contributor Author

Thanks @mmccrackan, some comments on things that are also done in #1014 which has not been merged yet. How should we rectify this? @msilvafe

It looks like the only major difference is that I didn't add the get_obslist function that you wrote. I can add that here since it looks useful for the multilayer_preprocessing. Since this pull request includes the preprocess_util function, I guess we can say that this branch supersedes #1014?

TODOs:
We are also going to change things so that stuff in preprocess_util currently imported from site_pipeline util are either copied or moved into preprocess_util.

Finally, we will attempt to merge preprocess_tod, preprocess_obs, and multilayer_preprocess_tod into a single script.

@mmccrackan
Copy link
Contributor Author

Just to comment on the latest pushes, these re-arrange the shared functions such that neither preprocess/preprocess_util.py or site_pipeline/util.py needs to import each other anymore. Both preprocess_obs.py and preprocess_tod.py include both of the util files. preprocess/preprocess_util.py and site_pipeline/util.py now have some copies of functions that are used elsewhere in site_pipeline or preprocess such as init_logger or ArchivePolicy.

Some of the functions were used in update_preprocess_plots.py so that has been updated to use the functions in preprocess/preprocess_util.py instead of site_pipeline/util.py

This approach should mostly maintain the hierarchy of preprocess functions being below site_pipeline.

@@ -523,6 +523,7 @@ def make_demod_map(context, obslist, noise_model, info,
List of outputs from preprocess database. To be used in cleanup_mandb.
"""
from .. import site_pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import here is only for the logger. @mhasself does not want us to import the site pipeline stuff as if it were a library. On the other hand, this logger is needed here because passing a logger from the script side to this function won't work with multiprocessing, as it won't print messages. Hence on this function, we have to create a new logger object to be able to print messages (this is how @msilvafe described it to me). Is there a workaround to this? Any suggestions? Maybe the logger stuff should also be moved to the library side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've copied the logger stuff into preprocess/preprocess_util.py so that the preprocessing module also has its own logger available. The mapmaking module could use that one or it could copy the logger functions into its own util file of some kind maybe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, perfect. Can you change the logger to be created from preprocess_util then, please?

@mmccrackan mmccrackan marked this pull request as ready for review November 15, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants