You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the same profile for which #858 was reported, I also found that ~3k seconds (about 15% of total time or 20% of total 'read' time) was taking in the build_datacontainers method, where it seems like it's the _get_slice method that is taking the time. Looking into this a bit further, I think that most of the time is taken because we're copying the data_array contents. Now, this is definitely a good thing to do by default, because you don't want a view of your array hanging around waiting to be inadvertently modified, but in some cases it doesn't really matter if we have a copy or not, and so I think it might be useful to have the option of making this non-copying.
Total time: 3129.79 s
File: /lustre/aoc/projects/hera/heramgr/anaconda3/envs/h6c/lib/python3.10/site-packages/hera_cal/io.py
Function: build_datacontainers at line 685
Line # Hits Time Per Hit % Time Line Contents
==============================================================
685 def build_datacontainers(self):
686 '''Turns the data currently loaded into the HERAData object into DataContainers.
687 Returned DataContainers include useful metadata specific to the data actually
688 in the DataContainers (which may be a subset of the total data). This includes
689 antenna positions, frequencies, all times, all lsts, and times and lsts by baseline.
690
691 Returns:
692 data: DataContainer mapping baseline keys to complex visibility waterfalls
693 flags: DataContainer mapping baseline keys to boolean flag waterfalls
694 nsamples: DataContainer mapping baseline keys to interger Nsamples waterfalls
695 '''
696 # build up DataContainers
697 2383 14136.0 5.9 0.0 data, flags, nsamples = odict(), odict(), odict()
698 2383 19193325.0 8054.3 0.6 meta = self.get_metadata_dict()
699 2474287 5506177.0 2.2 0.2 for bl in meta['bls']:
700 2471904 884136579.0 357.7 28.2 data[bl] = self._get_slice(self.data_array, bl)
701 2471904 865655205.0 350.2 27.7 flags[bl] = self._get_slice(self.flag_array, bl)
702 2471904 864163095.0 349.6 27.6 nsamples[bl] = self._get_slice(self.nsample_array, bl)
703 2383 25802682.0 10827.8 0.8 data = DataContainer(data)
704 2383 24716964.0 10372.2 0.8 flags = DataContainer(flags)
705 2383 27281524.0 11448.4 0.9 nsamples = DataContainer(nsamples)
706
707 # store useful metadata inside the DataContainers
708 9532 30303.0 3.2 0.0 for dc in [data, flags, nsamples]:
709 71490 378699.0 5.3 0.0 for attr in ['ants', 'data_ants', 'antpos', 'data_antpos', 'freqs', 'times', 'lsts', 'times_by_bl', 'lsts_by_bl']:
710 64341 412908902.0 6417.5 13.2 setattr(dc, attr, copy.deepcopy(meta[attr]))
711
712 2383 4949.0 2.1 0.0 return data, flags, nsamples
The text was updated successfully, but these errors were encountered:
In the same profile for which #858 was reported, I also found that ~3k seconds (about 15% of total time or 20% of total 'read' time) was taking in the
build_datacontainers
method, where it seems like it's the_get_slice
method that is taking the time. Looking into this a bit further, I think that most of the time is taken because we're copying thedata_array
contents. Now, this is definitely a good thing to do by default, because you don't want a view of your array hanging around waiting to be inadvertently modified, but in some cases it doesn't really matter if we have a copy or not, and so I think it might be useful to have the option of making this non-copying.The text was updated successfully, but these errors were encountered: