Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZarrMerge behaves weirdly #102

Open
hi-ilkin opened this issue Aug 16, 2023 · 0 comments
Open

ZarrMerge behaves weirdly #102

hi-ilkin opened this issue Aug 16, 2023 · 0 comments

Comments

@hi-ilkin
Copy link
Contributor

hi-ilkin commented Aug 16, 2023

Describe the bug

When overwrite parameter set True for scarf.ZarrMerge and either new assay or name added/changed with same modality, it fails to merge.

Tracebacks

Two different exceptions were observed.

  1. This one observed when a names field changed
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1303, in ZarrMerge._use_existing_zarr(self, zarr_loc, merge_assay_name, overwrite)
   1300     if not all(
   1301         z[cell_slot]["ids"][:] == np.array(self.mergedCells["ids"].values)  # type: ignore
   1302     ):
-> 1303         raise ValueError(
   1304             f"ERROR: order of cells does not match the one in existing file"
   1305         )
   1306 except KeyError:

ValueError: ERROR: order of cells does not match the one in existing file

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[31], line 1
----> 1 scarf.ZarrMerge(
      2     zarr_path = 'scarf_datasets/kang_merged_pbmc_rnaseq.zarr',  # destination path
      3     assays=[ds_ctrl.RNA, ds_stim.RNA], # assays to be merged
      4 
      5     # these names will be preprended to the cell ids with '__' delimiter
      6     names=['ctrl', 'stim2'],
      7     # Name of the merged assay. `overwrite` will remove an existing Zarr file.
      8     merge_assay_name='RNA',
      9     overwrite=True
     10 ).dump()

File ~/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1168, in ZarrMerge.__init__(self, zarr_path, assays, names, merge_assay_name, in_workspaces, out_workspace, chunk_size, dtype, overwrite, prepend_text, reset_cell_filter)
   1166 self.nFeats = self.mergedFeats.shape[0]
   1167 self.featOrder = self._ref_order_feat_idx()
-> 1168 self.z = self._use_existing_zarr(zarr_path, merge_assay_name, overwrite)
   1169 self._ini_cell_data(overwrite)
   1170 if dtype is None:

File ~/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1315, in ZarrMerge._use_existing_zarr(self, zarr_loc, merge_assay_name, overwrite)
   1312 except ValueError:
   1313     # So no zarr file with same name exists. Check if a non zarr folder with the same name exists
   1314     if isinstance(zarr_loc, str) and os.path.exists(zarr_loc):
-> 1315         raise ValueError(
   1316             f"ERROR: Directory/file with name `{zarr_loc}`exists. "
   1317             f"Either delete it or use another name"
   1318         )
   1319     # creating a new zarr file
   1320     return load_zarr(zarr_loc, mode="w")

ValueError: ERROR: Directory/file with name `scarf_datasets/kang_merged_pbmc_rnaseq.zarr`exists. Either delete it or use another name

  1. Following one happens when new assay added
/Users/ilkin/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1301: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  z[cell_slot]["ids"][:] == np.array(self.mergedCells["ids"].values)  # type: ignore
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 scarf.ZarrMerge(
      2     zarr_path = 'scarf_datasets/kang_merged_pbmc_rnaseq.zarr',  # destination path
      3     assays=[ds_ctrl.RNA, ds_stim.RNA, ds_stim.RNA], # assays to be merged
      4 
      5     # these names will be preprended to the cell ids with '__' delimiter
      6     names=['ctrl', 'stim2', 'stim3'],
      7     # Name of the merged assay. `overwrite` will remove an existing Zarr file.
      8     merge_assay_name='RNA',
      9     overwrite=True
     10 ).dump()

File ~/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1168, in ZarrMerge.__init__(self, zarr_path, assays, names, merge_assay_name, in_workspaces, out_workspace, chunk_size, dtype, overwrite, prepend_text, reset_cell_filter)
   1166 self.nFeats = self.mergedFeats.shape[0]
   1167 self.featOrder = self._ref_order_feat_idx()
-> 1168 self.z = self._use_existing_zarr(zarr_path, merge_assay_name, overwrite)
   1169 self._ini_cell_data(overwrite)
   1170 if dtype is None:

File ~/miniconda3/envs/fargate/lib/python3.10/site-packages/scarf/writers.py:1300, in ZarrMerge._use_existing_zarr(self, zarr_loc, merge_assay_name, overwrite)
   1295         raise ValueError(
   1296             f"ERROR: Zarr file already contains {merge_assay_name} assay. Choose "
   1297             "a different zarr path or a different assay name. Otherwise set overwrite to True"
   1298         )
   1299 try:
-> 1300     if not all(
   1301         z[cell_slot]["ids"][:] == np.array(self.mergedCells["ids"].values)  # type: ignore
   1302     ):
   1303         raise ValueError(
   1304             f"ERROR: order of cells does not match the one in existing file"
   1305         )
   1306 except KeyError:

TypeError: 'bool' object is not iterable

To Reproduce

For first exception run this code twice with different names.

scarf.ZarrMerge(
    zarr_path = 'scarf_datasets/kang_merged_pbmc_rnaseq.zarr',  
    assays=[ds_ctrl.RNA, ds_stim.RNA], 
    names=['ctrl', 'stim'], # change one of the names when running second time
    merge_assay_name='RNA',
    overwrite=True
).dump()

for second exception, run same code but add new assay to assays list , eg ds_stim.RNA

Expected behavior

Probably the easiest is, it removes old folder and re-runs fresh merge.

Scarf and Python version

Python 3.10.12
scarf 0.28.5
jupyter-lab 4.0.5
Mac OS 13.2.1 (M2 chip)

Note: Current work around is manually deleting the file/folder and running command again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant