Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toast 3 Work in Progress #369

Draft
wants to merge 649 commits into
base: master
Choose a base branch
from
Draft

Toast 3 Work in Progress #369

wants to merge 649 commits into from

Conversation

tskisner
Copy link
Member

Hi @zonca and @keskitalo, this PR is for your feedback on API changes that we discussed offline. In addition to looking at the source, I have been updating the tutorials/01_Introduction/intro.ipynb notebook as a "look and feel" example. I have attached a rendered version of the notebook:

intro.pdf

Main features are:

  • Observation class as the new data model, with detdata, shared, intervals, and view attributes as the key places where the contents are influenced.

  • Improved processing model with changes to the Operator class and the new Pipeline operator. These classes are configured with traitlets.

  • New distributed map classes (PixelDistribution and PixelData) which split the calculation of the distribution from the actual data storage. These have the new Alltoallv communication pattern.

There are only 2-3 operators that have been ported to the new API as a demo. I'll continue on some config file work that needs to be updated since the switch to traitlets.

@tskisner tskisner marked this pull request as draft October 15, 2020 20:50
@zonca
Copy link
Member

zonca commented Oct 15, 2020

I think the easiest way to provide feedback would be to make an export of the notebook to a Python script in a separate pull request, so we can do line by line feedback there. Then the pull request can be closed without merging.

@tskisner
Copy link
Member Author

Good idea @zonca, will do that soon.

Copy link
Member Author

Ok @zonca, I enabled the reviewnb plugin on the toast repo. I think you can browse here:

https://app.reviewnb.com//pull/369/files/

and comment on the per-cell level of the intro.ipynb file. Since a lot has changed, there is a switch to "hide previous version". Let me know if that works, since I can't tell if this plugin is usable by everyone.

@tskisner
Copy link
Member Author

Note that the output of the notebook is stripped on github, so refer to the PDF attached to this issue to look at that.

@tskisner
Copy link
Member Author

Updated notebook output, with config section.
intro.pdf

tutorial/01_Introduction/intro.ipynb Show resolved Hide resolved
@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention if you can modify this in place or it is read-only. If it is read-only, how do I modify it?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, some options are fixed by the OS runtime environment of python, but some can be changed after toast is imported. Will give more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added text about setting log level manually or through environment. Same with threading.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better specify that "traditional CPU systems" means a supercomputer, otherwise it seems it is also required on a laptop.


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the user has an AMD Ryzen workstation with 16 cores (for example), then they probably want to use mpi4py if they are doing something more with toast than just running a notebook with a few samples. I will definitely clarify though. I have started an "intro_parallel.ipynb" where I am going to discuss using IPython.parallel with mpi4py. I'll reference that in the serial notebook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify that toast parallelism is mainly through MPI, so that any system with more than a few cores will benefit from having mpi4py installed.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to implement units here?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. astropy.units are a new addition to toast, and currently only used in the new Operator traits. I need to systematically go through the codebase and add support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted the fake focalplane simulation and plotting functions to use units. However, I'll wait on the rest of the instrument classes until we can revisit the overall plan for those.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if detlabels is None, you could use the keys as labels, so we avoid to build the trivial dict x:x.

please use keyword arguments for all inputs, so people don't have to look at the help of plot_focalplane

For the color, what about using endswith("A") instead of enumerating?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is cleaned up.

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer namespacing, what about import toast.config as tc?


Reply via ReviewNB

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either it is an attribute, so other_simsat.config

or it is a method then needs to have a verb in the name:

other_simsat.get_config()




Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods are now get_config() and get_class_config() 

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by the traitlets methods traits() and class_traits(), but I can add a "get_" in there if it is more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

@@ -8,7 +8,7 @@
"source": [
Copy link
Member

@zonca zonca Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units are beautiful in the file!


Reply via ReviewNB

tutorial/01_Introduction/intro.ipynb Show resolved Hide resolved
Copy link
Member

zonca commented Oct 16, 2020

I had heard about it but first time I used reviewnb , it is awesome!

@tskisner, I think the Toast interface looks really good, good job! I have a bit more feedback in the notebook.

@tskisner
Copy link
Member Author

Thanks @zonca for the detailed review, just what I was looking for! On overall question for objects that act like a dictionary, but which have other state information. For example, obs.detdata stores a DetectorData objects, but also information about the observation instance like number of samples. So the call to

obs.detdata.create("foo", shape=(5,), dtype=np.int64)

actually creates a DetectorData object under the hood using the number of observation samples. I could also do:

obs.detdata["foo"] = DetectorData(obs.detectors, (obs.n_sample, 5), dtype=np.int64)

And then have setitem check that the number of samples matches that in the observation. This did not seem as convenient to me, but I do hate typing :-)

For the MPI shared memory class, I could replace the set method like this:

# Slice of buffer set by offset and shape of data
obs.shared["foo"].set(data, offset=(3, 6), fromrank=2)

or

# Slice of buffer set by explicity range
obs.shared["foo"][3:7, 6:8] = (data, 2)

or something else?

@zonca
Copy link
Member

zonca commented Oct 16, 2020

inside __setitem__ you can do:

    def __setitem__(self, key, value):
        # if key is undefined
        data = DetectorData(self.detectors, (self.n_sample,)+ value.shape, dtype=value.dtype)
        # set this into the dict

for the set I don't understand, what do you need the fromrank for?

@tskisner
Copy link
Member Author

I'll try to clarify... The detdata attribute holds DetectorData objects. So I can certainly modify setitem to do sanity checks on the input value and make sure that the shape is compatible and if so, just assign to the internal dictionary. However, that requires the user to pre-create the DetectorData object first, using information from the observation instance. It seemed like more work for the user this way...

For the MPIshared class, the set() operation is collective and takes data from one process and broadcasts to shared memory across multiple nodes. So there is an argument (fromrank) to determine which process has the data. I guess I could figure that out though by doing an allreduce first to see which process has a non-None value.

@tskisner
Copy link
Member Author

I think I understand now- you want to allow obs.detdata setitem to take a numpy array and assume that it is the full 2D contents of the underlying DetectorData. So then you could do:

obs.detdata["foo"] = np.zeros((len(obs.detectors), obs.n_sample), dtype=np.int64)

I can implement that, but still not sure it is more convenient. On the other hand, no reason not to support multiple ways of assignment!

@tskisner
Copy link
Member Author

Ahhh, now I see- you can create the full-size DetectorData object first, and then the slicing notation can be applied when actually assigning from the RHS.

Ok, I will try this out. I agree this would be a more convenient interface. I'll also try to modify the MPIshared package to make the set() method optional at the cost of a precalculation.

@tskisner
Copy link
Member Author

I have added setitem support to the upstream pshmem package:

https://github.com/tskisner/pshmem/releases/tag/0.2.5

And this new version is available on PyPI:

https://pypi.org/project/pshmem/0.2.5/

So now I can work on using that syntax in toast.

@tskisner
Copy link
Member Author

Ok, I think I have concluded that the create() methods cannot be replaced in the proposed way. The Observation attributes detdata and shared do not have access to the right-hand-side of the assignment, since that is passed to the DetectorData.__setitem__() method after the manager class __getitem__() is called. To simplify the example, look at this snippet of code:

import numpy as np

class DetData:

    def __init__(self, ndet, shape, dtype):
        self.dtype = np.dtype(dtype)
        self.shape = [ndet]
        self.flatshape = ndet
        for s in shape:
            self.shape.append(s)
            self.flatshape *= s
        self.flatdata = np.zeros(self.flatshape, dtype=self.dtype)
        self.data = self.flatdata.reshape(self.shape)

    def __getitem__(self, key):
        print("DetData getitem {}".format(key))
        return self.data[key]

    def __setitem__(self, key, value):
        print("DetData setitem {}".format(key))
        self.data[key] = value

    def __repr__(self):
        return str(self.data)

class Mgr:
    
    def __init__(self, ndet):
        self.ndet = ndet
        self.d = dict()

    def create(self, name, shape, dtype):
        self.d[name] = DetData(self.ndet, shape, dtype)
        
    def __getitem__(self, key):
        print("Calling Mgr getitem")
        if key not in self.d:
            # Cannot guess what shape and dtype the user wants
            pass
        return self.d[key]

    def __setitem__(self, key, value):
        print("Calling Mgr setitem")
        self.d[key] = value


mgr = Mgr(2)

# This works fine, as expected:

mgr.create("A", (3, 4), np.int32)

mgr["A"][1, 0:2, 0:2] = 5

print("mgr['A'] = \n", mgr["A"])

# This works, but it is annoying, since the user has to know the name
# of the DetData class and also has to get information from the Mgr
# class:

mgr["B"] = DetData(mgr.ndet, (3, 4), np.int32)

mgr["B"][1, 0:2, 0:2] = 5

print("mgr['B'] = \n", mgr["B"])

# The code below is actually doing:
#
# Mgr.__getitem__("C").__setitem(tuple, 5)
#
# Which means that the DetData class would have to be instantiated in
# Mgr.__getitem__() where we don't know the correct shape of the buffer
# to create.  Obviously this gives a key error:

mgr["C"][1, 0:2, 0:2] = 5

print("mgr['C'] = \n", mgr["C"])

The output of the above script is:

Calling Mgr getitem
DetData setitem (1, slice(0, 2, None), slice(0, 2, None))
Calling Mgr getitem
mgr['A'] = 
 [[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[5 5 0 0]
  [5 5 0 0]
  [0 0 0 0]]]
Calling Mgr setitem
Calling Mgr getitem
DetData setitem (1, slice(0, 2, None), slice(0, 2, None))
Calling Mgr getitem
mgr['B'] = 
 [[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[5 5 0 0]
  [5 5 0 0]
  [0 0 0 0]]]
Calling Mgr getitem
Traceback (most recent call last):
  File "setitem.py", line 75, in <module>
    mgr["C"][1, 0:2, 0:2] = 5
  File "setitem.py", line 40, in __getitem__
    return self.d[key]
KeyError: 'C'

@zonca
Copy link
Member

zonca commented Oct 17, 2020

you first need to create the thing before slicing it:

mgr = Mgr(2)

mgr["A"] = np.zeros((3,4), dtype=np.int32)

mgr["A"][1, 0:2, 0:2] = 5

It doesn't work now, but it should be implementable.

@tskisner
Copy link
Member Author

Hi @zonca, thanks for your patience, and sorry if I am being dense :-) Below I updated the toy code to be closer to the real case. The central problem is that when assigning data (see the mgr["C"] case), we don't know which detector and sample range is being specified. If the user assigns data for the full array (see the mgr["D"] case), then we can assign it, but otherwise all we can do is create the buffer and then leave it.

import numpy as np

class DetData:
    def __init__(self, ndet, shape, dtype):
        self.dtype = np.dtype(dtype)
        self.shape = [ndet]
        self.flatshape = ndet
        for s in shape:
            self.shape.append(s)
            self.flatshape *= s
        self.flatdata = np.zeros(self.flatshape, dtype=self.dtype)
        self.data = self.flatdata.reshape(self.shape)

    def __getitem__(self, key):
        return self.data[key]

    def __setitem__(self, key, value):
        self.data[key] = value

    def __repr__(self):
        return str(self.data)


class Mgr:
    def __init__(self, ndet, nsample):
        self.ndet = ndet
        self.nsample = nsample
        self.d = dict()

    def create(self, name, sample_shape, dtype):
        self.d[name] = DetData(self.ndet, (self.nsample,) + sample_shape, dtype)

    def __getitem__(self, key):
        return self.d[key]

    def __setitem__(self, key, value):
        if isinstance(value, DetData):
            self.d[key] = value
        else:
            # This is an array, verify that the number of dimensions match
            sample_shape = None
            if len(value.shape) < 2:
                raise RuntimeError("Assigned array does not have sufficient dimensions")
            elif len(value.shape) == 2:
                # We assume the user meant one scalar value per sample...
                sample_shape = (1,)
            else:
                # The first two dimensions are detector and sample.  The rest of the
                # dimensions are the data shape for every sample and must be fully
                # specified when creating data like this.
                sample_shape = value.shape[2:]
            print(
                "Creating DetData with {} dets, {} samples, {} samp shape".format(
                    self.ndet, self.nsample, sample_shape
                )
            )
            self.d[key] = DetData(
                self.ndet, (self.nsample,) + sample_shape, value.dtype
            )
            # If the value has the full size of the DetData object, then we can do the
            # assignment, if not, then we cannot guess what detector / sample slice
            # the user is trying to assign.
            if (value.shape[0] == self.ndet) and (value.shape[1] == self.nsample):
                # We can do it!
                self.d[key][:] = value


# 2 detectors and 5 samples
mgr = Mgr(2, 5)

# This works fine, as expected:

mgr.create("A", (3, 4), np.int32)
mgr["A"][1, 2:3, 0:2, 0:2] = 5
print("mgr['A'] = \n", mgr["A"])

# This works, but it is annoying, since the user has to know the name
# of the DetData class and also has to get information from the Mgr
# class:

mgr["B"] = DetData(mgr.ndet, (mgr.nsample, 3, 4), np.int32)
mgr["B"][1, 2:3, 0:2, 0:2] = 5
print("mgr['B'] = \n", mgr["B"])

# This creates a buffer with the full number of detectors and samples and uses the
# last dimensions of the RHS to determine the shape of the data per sample.  However,
# we have no information about what LHS slice we are assigning the RHS data to.  UNLESS
# the user gives a RHS data with the full n_detector x n_sample data set:

# mgr["C"] is created by not assigned, since we don't know where to assign the data
# along the first 2 axes (detector and sample).
mgr["C"] = np.ones((1, 1, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

# mgr["D"] is created AND assigned, since we specify data of the full size.
mgr["D"] = np.ones((mgr.ndet, mgr.nsample, 3, 4), dtype=np.int32)
mgr["D"][1, 2:3, 0:2, 0:2] = 5
print("mgr['D'] = \n", mgr["D"])

I think that the mgr["C"] case is actually very confusing, since we are using the right hand side value just to get dimensions but not actually placing those values into the new array (since we don't have information about the location of that data in the full array).

How about we support cases A, B, and D:

  • Explicit create method
  • Assignment of a pre-created DetData, with checks for compatible number of detectors and samples
  • Assignment of a numpy array with the full-size data.

Does that seem acceptable?

@zonca
Copy link
Member

zonca commented Oct 19, 2020

here:

# mgr["C"] is created by not assigned, since we don't know where to assign the data
# along the first 2 axes (detector and sample).
mgr["C"] = np.ones((1, 1, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

This case is not supported, the user needs to initialize the array in 2 ways:

  • provide all samples/ all detectors (see case D)
  • provide all samples/1 detector (will be replicated to all detectors)

so the use case is:

# provide just 1 timeline, it will be copied to all detectors, we should support both 3D and 4D
mgr["C"] = np.ones((mgr.n_samples, 3, 4), dtype=np.int32)
# or
mgr["C"] = np.ones((1, mgr.n_samples, 3, 4), dtype=np.int32)
mgr["C"][1, 2:3, 0:2, 0:2] = 5
print("mgr['C'] = \n", mgr["C"])

inside mgr there should be an assert that checks that we have the right axis having a length of n_samples.

@tskisner
Copy link
Member Author

Ok, no problem that sounds good. Will work on implementing and addressing your other feedback.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

tskisner and others added 30 commits June 27, 2024 09:00
* Various small fixes

- Fix deadlock in 1D polyfilter if a process has no good detectors

- When flagging noise model outliers, iterate until no further detectors
  are cut.

- In the template classes and Amplitude class, handle the case where
  some processes have no good detectors.

* Fix typos
* Extend scan-synchronous signal to have polarization

* Fix logic error
#776)

* Add option in the detector pointing operator to deflect the pointing based on the HWP angle

* Make toast the first import

* Add HWP phase and unit test
* Various fixes and features for dealing with realistic data.

* Move HWPSS utility functions into a separate source file

* Add a new CalibrateDetectors operator which takes a dictionary of
  factors to apply per observation.

* Change detector timeconstant deconvolution to use serial rather
  than batched FFTs by default.  This reduces memory footprint in
  the common case where most parallelism comes from MPI.

* Add option to AzimuthIntervals to also cut extraneous long intervals

* Support both PDF and PNG image formats for plots.

* When demodulating data, also propagate per-detector flags.

* Fix deadlocks caused by logging barriers in noise estimation.

* Several small fixes

* Re-enable flags in hwpfilter test

* Address review comments
…779)

* Modify deflection code so that the overall detector polarization
  direction stays fixed (i.e. the deflection is a translation)

* Add a unit test which makes plots of the detector pointing

* Small fixes to the plotting utility

* Small fix for case where quaternions are constructed from an
  array of axes.
* Update internal RELEASE file

* Fix bundled package scripts so that libflac build respects the shared and
  static configuration setting
The validation for the nbin_psd trait was assuming an integer, but it could be None
…array. (#785)

Wrap matplotlib.cbook.flatten calls in a list before constructing an array,
since these are generator expressions without a fixed length.
* First functional implementation

* Fix parallelization issues

* Remove save_pointing

* Remove unused variables

* More consistent used of the observation reference

* Remove unused trait
* Even super short observations need at least two SLERP points

* If the user does not want the map, don't bin it.

* One more case when binning is required
* Add method to sort observing schedule by RA

* Replace  and  with
* Various fixes and features for dealing with realistic data.

* Move HWPSS utility functions into a separate source file

* Add a new CalibrateDetectors operator which takes a dictionary of
  factors to apply per observation.

* Change detector timeconstant deconvolution to use serial rather
  than batched FFTs by default.  This reduces memory footprint in
  the common case where most parallelism comes from MPI.

* Add option to AzimuthIntervals to also cut extraneous long intervals

* Add new operator to simultaneous remove a Maxipol style HWPSS
  template and build relative calibration factors from the 2f magnitude

* New operator for estimating HWP synchronous signal.

This work introduces a new operator that can model HWPSS in various
ways, and optionally estimate the relative gains between detectors
based on the 2f harmonics.   Features include:

- Detection and flagging of samples with a stopped HWP.

- Model coefficients can be estimated on the whole observation, on
  fixed-length chunks, or on pre-defined intervals.  The chunk-wise
  model coefficients are then smoothly interpolated over the
  observation.

- Model can optionally include a time drift separate from the
  chunking.

- The 2f harmonic is used to estimate the relative gain between
  detectors, either as a fixed value per observation or as a
  continuous calibration timestream.  This can be used to flag
  outlier detectors and generate calibration tables / timestreams
  for application.

- Includes proper treatment of flagged samples in normalization of
  model coefficients and covariance.

- Extensive optional debug plots.

* Fix unit tests

* Fix hwpss destriping template unit test

* - Add helper methods for determining if an observation is distributed
  purely by detector or sample.  Use these everywhere in the code that
  currently does this manual check.

- Extend flagging of stopped HWP to also include acceleration / deceleration
  periods.  Add unit test for this functionality.

- Generate an error when all fit chunks fail.

- Handle case where only one chunk fails.

* Fix typo

* Remove support for python-3.8, which has reached end of life.

* Bump requirements to python-3.9.
Fix not using the `det_mask` input value when selecting detectors.
Change default bit mask to `defaults.det_mask_invalid`.
`defaults.det_mask_nonscience` exclude turnaround, it is valuable to
deglitch or jump correct the turnarounds as well.
scipy.signal.convolve will choose the fastest convolve method (direct or
fft) based on scipy.signal.choose_conv_method result.
Add missing while loop, that iteratively find jumps.
Move `len(mytoi)` outside of loop to avoid repeated calculation.
…efault bit mask (#794)

* Fix not using given `det_mask` bit mask

Fix not using the `det_mask` input value when selecting detectors.

* Change default bit mask

Change default bit mask to `defaults.det_mask_invalid`.
`defaults.det_mask_nonscience` exclude turnaround, it is valuable to
deglitch or jump correct the turnarounds as well.
* Update bundled pybind11 and random123 to latest versions for better
  compatibility with clang++ on arm64.

* Add __init__.py files to data directories to silence warnings.

* Use importlib.resources.as_file() instead of pkg_resources for
  compatibility with python-3.12.

* Build wheels with numpy-2.0.x, which is backwards compatible at
  runtime with numpy-1.x and also compatible with numpy-2.1.x.

* When building suitesparse for wheels, remove patch and use the
  cmake system to enable only cholmod and build everything (rather
  than use the archaic Makefiles directly).

* Lift runtime requirements on suitesparse and numpy since we are
  now compatible with the latest versions.

* Bump versions of vendored OpenBLAS and suitesparse.

* For wheels on macos, build our own openblas rather than use
  libscipy_openblas, which has strange symbol name mangling that
  does not seem to work with clang++.

* In the unit test workflow, add python-3.12 tests and also run
  tests on macos arm64.

* In the wheel test and deploy frameworks, add python-3.12 and
  macos arm64 to the build matrix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants