`build_datacontainers` is slow #859

steven-murray · 2023-01-12T21:20:26Z

In the same profile for which #858 was reported, I also found that ~3k seconds (about 15% of total time or 20% of total 'read' time) was taking in the build_datacontainers method, where it seems like it's the _get_slice method that is taking the time. Looking into this a bit further, I think that most of the time is taken because we're copying the data_array contents. Now, this is definitely a good thing to do by default, because you don't want a view of your array hanging around waiting to be inadvertently modified, but in some cases it doesn't really matter if we have a copy or not, and so I think it might be useful to have the option of making this non-copying.

Total time: 3129.79 s
File: /lustre/aoc/projects/hera/heramgr/anaconda3/envs/h6c/lib/python3.10/site-packages/hera_cal/io.py
Function: build_datacontainers at line 685

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   685                                               def build_datacontainers(self):
   686                                                   '''Turns the data currently loaded into the HERAData object into DataContainers.
   687                                                   Returned DataContainers include useful metadata specific to the data actually
   688                                                   in the DataContainers (which may be a subset of the total data). This includes
   689                                                   antenna positions, frequencies, all times, all lsts, and times and lsts by baseline.
   690                                           
   691                                                   Returns:
   692                                                       data: DataContainer mapping baseline keys to complex visibility waterfalls
   693                                                       flags: DataContainer mapping baseline keys to boolean flag waterfalls
   694                                                       nsamples: DataContainer mapping baseline keys to interger Nsamples waterfalls
   695                                                   '''
   696                                                   # build up DataContainers
   697      2383      14136.0      5.9      0.0          data, flags, nsamples = odict(), odict(), odict()
   698      2383   19193325.0   8054.3      0.6          meta = self.get_metadata_dict()
   699   2474287    5506177.0      2.2      0.2          for bl in meta['bls']:
   700   2471904  884136579.0    357.7     28.2              data[bl] = self._get_slice(self.data_array, bl)
   701   2471904  865655205.0    350.2     27.7              flags[bl] = self._get_slice(self.flag_array, bl)
   702   2471904  864163095.0    349.6     27.6              nsamples[bl] = self._get_slice(self.nsample_array, bl)
   703      2383   25802682.0  10827.8      0.8          data = DataContainer(data)
   704      2383   24716964.0  10372.2      0.8          flags = DataContainer(flags)
   705      2383   27281524.0  11448.4      0.9          nsamples = DataContainer(nsamples)
   706                                           
   707                                                   # store useful metadata inside the DataContainers
   708      9532      30303.0      3.2      0.0          for dc in [data, flags, nsamples]:
   709     71490     378699.0      5.3      0.0              for attr in ['ants', 'data_ants', 'antpos', 'data_antpos', 'freqs', 'times', 'lsts', 'times_by_bl', 'lsts_by_bl']:
   710     64341  412908902.0   6417.5     13.2                  setattr(dc, attr, copy.deepcopy(meta[attr]))
   711                                           
   712      2383       4949.0      2.1      0.0          return data, flags, nsamples

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`build_datacontainers` is slow #859

`build_datacontainers` is slow #859

steven-murray commented Jan 12, 2023

build_datacontainers is slow #859

build_datacontainers is slow #859

Comments

steven-murray commented Jan 12, 2023

`build_datacontainers` is slow #859

`build_datacontainers` is slow #859