Modifying a subset of AnnData using the .iloc/.loc method does not make a new copy, and the original object is modified #1840

crazyxiaoj · 2025-01-27T11:41:06Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of anndata.
(optional) I have confirmed this bug exists on the master branch of anndata.

Report

When using the .iloc or .loc methods to modify a subset of an AnnData object, it seems that no new copy is created; instead, the original AnnData object is directly modified.

Code:

from anndata import AnnData
import numpy as np

a = AnnData(X=np.arange(16).reshape(4,4), var=list('ABCD'), obs=list('abcd'))
b = a[:2,:2]
b.obs.iloc[:,:] = 0  # the same results using the .loc method.
b
# View of AnnData object with n_obs × n_vars = 2 × 2
#     obs: 0
#     var: 0
a.obs
#    0
# 0  0
# 1  0
# 2  c
# 3  d

As a beginner, I'm not sure if this behavior is a bug or by design. Could someone clarify whether this is intentional, and if so, could you please explain why it functions this way? Thanks for your assistance!

Versions

| Package | Version |
| ------- | ------- |
| pandas  | 2.2.3   |
| anndata | 0.11.3  |
| numpy   | 2.1.3   |
| Dependency         | Version     |
| ------------------ | ----------- |
| Pygments           | 2.18.0      |
| matplotlib         | 3.9.3       |
| defusedxml         | 0.7.1       |
| traitlets          | 5.14.3      |
| stack_data         | 0.6.3       |
| decorator          | 5.1.1       |
| jaraco.text        | 3.12.1      |
| six                | 1.17.0      |
| charset-normalizer | 3.4.0       |
| scipy              | 1.14.1      |
| pillow             | 11.0.0      |
| pyparsing          | 3.2.0       |
| session-info2      | 0.1.2       |
| platformdirs       | 4.3.6       |
| packaging          | 24.2        |
| h5py               | 3.12.1      |
| jaraco.collections | 5.1.0       |
| jaraco.context     | 5.3.0       |
| setuptools         | 75.6.0      |
| natsort            | 8.4.0       |
| cycler             | 0.12.1      |
| asttokens          | 3.0.0       |
| parso              | 0.8.4       |
| python-dateutil    | 2.9.0.post0 |
| kiwisolver         | 1.4.7       |
| jedi               | 0.19.2      |
| prompt_toolkit     | 3.0.48      |
| ipython            | 8.30.0      |
| pytz               | 2024.1      |
| pure_eval          | 0.2.3       |
| more-itertools     | 10.3.0      |
| pickleshare        | 0.7.5       |
| jaraco.functools   | 4.0.1       |
| wcwidth            | 0.2.13      |
| executing          | 2.1.0       |
| Component | Info                                                                          |
| --------- | ----------------------------------------------------------------------------- |
| Python    | 3.13.1 | packaged by conda-forge | (main, Dec  5 2024, 21:23:54) [GCC 13.3.0] |
| OS        | Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31                 |
| Updated   | 2025-01-27 11:47                                                              |

AlessiaLeclercq · 2025-01-28T16:08:05Z

Hello,
I am also trying to subset an AnnData object using some obs values.
The dataset is the "Peaks_RNA.loom" found here.
Specifically I have an AnnData object called and I want to subset according to the obs columns "Method" and "Tissue".
Here the code:

import os
import scanpy as sc
path = ... #path to loom file 
data = sc.read_loom(path) 
print(data.shape) #returns  526094 × 59480
subset_data = data[data.obs["Method"]=="rnaXatac"]
subset_data = data[data.obs["Tissue"].isin(["Cerebellum", "Brain"])]
print(subset_data) # View of AnnData object with n_obs × n_vars = 44333 × 59480 ...

However I wish it to be a proper AnnData object as to save it into h5ad file.
How can I do it? I am using python 3.9.6.
Here follows the description of the environment:

anndata==0.10.8
annoy==1.17.3
array_api_compat==1.9.1
bbknn==1.6.0
cellrank==2.0.6
click==8.1.7
contourpy==1.3.0
cycler==0.12.1
Cython==3.0.11
dnspython==2.7.0
docrep==0.3.2
et_xmlfile==2.0.0
exceptiongroup==1.2.2
fcsparser==0.2.8
filelock==3.16.1
fonttools==4.54.1
fsspec==2024.12.0
future==1.0.0
get-annotations==0.1.2
h5py==3.12.1
harmonypy==0.0.10
hyperopt==0.1.2
igraph==0.11.8
importlib_metadata==8.5.0
importlib_resources==6.4.5
jax==0.4.30
jaxlib==0.4.30
jaxopt==0.8.3
Jinja2==3.0.3
joblib==1.4.2
kiwisolver==1.4.7
legacy-api-wrap==1.4
leidenalg==0.10.2
llvmlite==0.43.0
loompy==3.0.7
louvain==0.8.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
mdurl==0.1.2
mellon==1.5.0
ml_dtypes==0.5.1
mofapy2==0.7.2
mpmath==1.3.0
mudata==0.2.4
muon==0.1.6
natsort==8.4.0
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
numpy-groupies==0.11.2
openpyxl==3.1.5
opt_einsum==3.4.0
packaging==24.1
palantir==1.3.6
pandas==2.2.3
patsy==0.5.6
petsc==3.22.0
petsc4py==3.22.0
pillow==11.0.0
progressbar2==4.5.0
protobuf==5.29.0
pygam==0.9.1
Pygments==2.19.1
pygpcca==1.0.4
pymongo==4.10.1
pynndescent==0.5.13
pyparsing==3.2.0
pysam==0.22.1
python-dateutil==2.9.0.post0
python-utils==3.9.0
pytz==2024.2
rich==13.9.4
scanpy==1.10.3
scikit-learn==1.5.2
scikit-misc==0.3.1
scipy==1.11.4
scvelo @ git+https://github.com/theislab/scvelo@22b6e7e6cdb3c321c5a1be4ab2f29486ba01ab4f
scvi==0.6.8
scvi-colab==0.12.0
seaborn==0.13.2
session-info==1.0.0
six==1.16.0
slepc==3.22.1
slepc4py==3.22.1
statsmodels==0.14.4
stdlib-list==0.11.0
sympy==1.13.1
texttable==1.7.0
threadpoolctl==3.5.0
torch==2.5.1
tqdm==4.66.6
typing_extensions==4.12.2
tzdata==2024.2
umap-learn==0.5.7
wrapt==1.16.0
xlrd==2.0.1
zipp==3.20.2

ilan-gold · 2025-01-30T16:02:50Z

When using the .iloc or .loc methods to modify a subset of an AnnData object, it seems that no new copy is created; instead, the original AnnData object is directly modified.

@crazyxiaoj as far as I can tell, this behavior is totally expected. A view is just that, a view. So if you edit the view, you'll edit the actual object. It might be worth disallowing this completely, but there are probably cases where the behavior is desirable.

However I wish it to be a proper AnnData object as to save it into h5ad file.

@AlessiaLeclercq If you can't do it directly with the object you have (possible), you certainly can create a copy via copy i.e., adata.copy(): https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.copy.html

import anndata as ad
import numpy as np

adata = ad.AnnData(X=np.array([[1, 2], [3, 4]]))
adata[:1,:].write_h5ad("foo.h5ad") # works, but also `.copy` is fine

crazyxiaoj · 2025-01-30T16:25:05Z

When using the .iloc or .loc methods to modify a subset of an AnnData object, it seems that no new copy is created; instead, the original AnnData object is directly modified.

@crazyxiaoj as far as I can tell, this behavior is totally expected. A view is just that, a view. So if you edit the view, you'll edit the actual object. It might be worth disallowing this completely, but there are probably cases where the behavior is desirable.

Your explanation is a bit unclear to me. I referred to the content on the following webpage: https://anndata.readthedocs.io/en/stable/generated/anndata.AnnData.html.

Here’s the relevant excerpt:

Copying a view causes an equivalent “real” AnnData object to be generated. Attempting to modify a view (at any attribute except X) is handled in a copy-on-modify manner, meaning the object is initialized in place.

Based on the paragraph above, it appears that modifying properties like obs results in the creation of a new AnnData object. Additionally, I noticed that performing an assignment directly using [], rather than the iloc method, also triggers the creation of a new object.

ilan-gold · 2025-01-30T16:37:23Z

Based on the paragraph above, it appears that modifying properties like obs results in the creation of a new AnnData object. Additionally, I noticed that performing an assignment directly using [], rather than the iloc method, also triggers the creation of a new object.

Thanks for sharing this. The issue here would be wrapping every single dataframe method. I'm not sure why this wasn't done initially since only drop was wrapped. I was aware of the "copy-on-write" paradigm but I thought the promise was more shallow than this i.e., affecting things only like columns or keys. We should compile a list of things here, I suppose:

set_index (although this one is very bad for other reasons)
loc
iloc
insert
pop
drop_duplicates
rename_axis

and much more. This might be why this wasn't done. So it's possible we should carve out an exception for pandas

crazyxiaoj · 2025-01-30T16:49:51Z

Thank you for your clarification. I think I'm beginning to understand.

Do you still believe it's necessary to open this issue? If you feel it is no longer needed, we can consider closing this issue.

ilan-gold · 2025-01-30T16:57:54Z

Do you still believe it's necessary to open this issue? If you feel it is no longer needed, we can consider closing this issue.

Well it is certainly an inconsistency so it seems we should either edit the docs or add the feature set. @ivirshup I've asked to weigh in

AlessiaLeclercq · 2025-01-31T13:48:35Z

Thank you!

crazyxiaoj added Bug 🐛 Triage 🩺 labels Jan 27, 2025

ilan-gold removed the Triage 🩺 label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying a subset of AnnData using the .iloc/.loc method does not make a new copy, and the original object is modified #1840

Modifying a subset of AnnData using the .iloc/.loc method does not make a new copy, and the original object is modified #1840

crazyxiaoj commented Jan 27, 2025 •

edited

Loading

AlessiaLeclercq commented Jan 28, 2025

ilan-gold commented Jan 30, 2025

crazyxiaoj commented Jan 30, 2025

ilan-gold commented Jan 30, 2025

crazyxiaoj commented Jan 30, 2025

ilan-gold commented Jan 30, 2025

AlessiaLeclercq commented Jan 31, 2025

Modifying a subset of AnnData using the .iloc/.loc method does not make a new copy, and the original object is modified #1840

Modifying a subset of AnnData using the .iloc/.loc method does not make a new copy, and the original object is modified #1840

Comments

crazyxiaoj commented Jan 27, 2025 • edited Loading

Please make sure these conditions are met

Report

Versions

AlessiaLeclercq commented Jan 28, 2025

ilan-gold commented Jan 30, 2025

crazyxiaoj commented Jan 30, 2025

ilan-gold commented Jan 30, 2025

crazyxiaoj commented Jan 30, 2025

ilan-gold commented Jan 30, 2025

AlessiaLeclercq commented Jan 31, 2025

crazyxiaoj commented Jan 27, 2025 •

edited

Loading