Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master regeneration bug associated with a changing pypeit file #1022

Open
jhennawi opened this issue Sep 7, 2020 · 5 comments
Open

Master regeneration bug associated with a changing pypeit file #1022

jhennawi opened this issue Sep 7, 2020 · 5 comments
Assignees

Comments

@jhennawi
Copy link
Collaborator

jhennawi commented Sep 7, 2020

Hi All, one thing I’ve noticed while observing at the telescope. On an observing run, we typically continuously add files to the pypeit files as the data is collected. I’m seeing that the MasterEdges and MasterFlat files are being recreated sometimes even when I run with run_pypeit -m. These different master files have different bit identifiers, i.e.

Masters/MasterEdges_A_1048575_01.fits.gz
Masters/MasterEdges_A_16383_01.fits.gz
Masters/MasterEdges_A_16777215_01.fits.gz
Masters/MasterEdges_A_262143_01.fits.gz

My understanding of what is going on here is that as I add data to PypeIt file, the bit arithmetic changes for the science frames that this master file corresponds to (i.e. specified by the calib field). As a result the name changes, and hence the Master is regenerated.

I think the bit arithmetic is great for determining which master to use for which science files, however it is an essentially useless naming convention. Not only can one not easily figure out which science frames this master applies to, but it also results in a time consuming regeneration of the master file if I simply add a new target to my pypeit file.

My proposal is that:
-- The Master file should have an identifier that is determined by the files that got used to generate it, and it should only be regenerated if those files changed. For example,

MasterEdges_s200907_0057-s200907_0062.fits

where s200907_0057.fits/s200907_0062.fits is the first/last raw file used to create the master.

We may need to think about whether there is a situation where this naming convention would not be unique.

-- The Master file names should be revisited so that users can easily associate them with actual frames. My suggestion is to use
the first file in the list of files used to generate the master. Alternatively, we could use the first and last file. It needs to be a file though, and not some arbitrary complicated number. If a user has a problem with a reduction caused by a problem in a master
file they need to be able to easily figure out which master frames to inspect based on QA plots or our utility scripts. This is essentially impossible with the bit arithmetic, without digging into the data container to figure out the matchup between bits and frames.

@profxj
Copy link
Collaborator

profxj commented Sep 7, 2020

I can't tell if the Issue describe is a bug (i.e. the code is not performing as
planned) or user confusion/dissatisfaction.

If the latter, I suggest a simple script that prints to the screen the files
used to generate each calibration file (we should confirm this is also
written to the header).

@profxj profxj removed their assignment Sep 7, 2020
@jhennawi
Copy link
Collaborator Author

jhennawi commented Sep 7, 2020

It is both a bug, and dissatisfaction with the naming convention that is the source of the bug. If you want to reproduce the bug do the following in the dev suite:

Go to this PypeIt file:

gemini_gnirs_32_sb_sxd.pypeit

Pretend you are at the telescope reducing data as it is collected. Run the standard and the first ABBA block (i.e. calib = 0,1, comment out the second ABBA block). Then try to add the second ABBA block (calib=2,3) as if it just finished reading out. You will see that when you try to re-run the code with the new ABBA block, because of the new calib ids, the bits assigned to the master frames now change, and hence so does the master filename. Since the new masterfilename does not exist, it remakes masters. So it constantly remakes masters and saves new masters, wasting time and disk space, although this is exactly the same calibration file every time.

It is a major bug that some script does not fix.

@kbwestfall
Copy link
Collaborator

Just a follow-up thought on this: The easiest solution that I can come up with is just to change

return '{0}_{1}_{2}'.format(self['setup'][row], self['calibbit'][row], str(det).zfill(2))

in PypeItMetaData.master_key to

return '{0}_{1}_{2}'.format(self['setup'][row], np.sort(self.calib_bitmask.flagged_bits(self['calibbit'][row]))[0], str(det).zfill(2))

This would change the name of the master file to use the smallest calibration group number in the name of the file. That would get around this issue as long as you only ever increase the calibration group numbers as you add new data.

But I don't know how disciplined we've been about using PypeItMetaData.master_key to get the master keys, and there may be other propagating effects that I'm not remembering.

@jhennawi
Copy link
Collaborator Author

Hi @kbwestfall,

I agree that this fixes the problem. In my opinion a solution that assigns a unique MasterKey based on the files used to create the master would be superior, since this identifier should only depend on the files used. Then given that identifier, we should have a script that will report what files were used for the creation of that master, and also list the calibration ids for which that master is used. Whether this script should ingest the pipet file, or simply sweep the Master directory is not so clear to me, but I sort of prefer the latter.

@profxj
Copy link
Collaborator

profxj commented Oct 8, 2021

What is the unique and terse MasterKey to be generated?
Maybe something using MJD?

Without that, this Issue can't proceed.

@profxj profxj removed their assignment Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants