Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeoutError with cellxgene_census.get_anndata() #1253

Open
LysSanzMoreta opened this issue Jul 30, 2024 · 7 comments
Open

TimeoutError with cellxgene_census.get_anndata() #1253

LysSanzMoreta opened this issue Jul 30, 2024 · 7 comments

Comments

@LysSanzMoreta
Copy link

Hi!

Recently I am experiencing issues trying to simply download the data . I would like to use the main "organ" or "tissue_general" as I have seen it labelled in the website as my query filter.

Before I saw the option of the "tissue_general", I was using the "tissue" kwarg to download the tissue samples such as "alveolus of lung" or "bronchopulmonary lymph node" separately. However that seemed inefficient and I decided to try to make this other option work.

I could be using it wrong, however I was able to download the "lung" dataset. Is perhaps only the lung dataset ready and not the other ones? Or wrong census?

#Option 1: Using cellxgene_census

census = cellxgene_census.open_soma(census_version="2023-07-25")
tissue = "kidney" #I have also tried smaller datasets ie.uterus
with census:
            adata = cellxgene_census.get_anndata(
                census=census,
                organism="Homo sapiens",
                obs_value_filter=f"tissue_general == '{tissue}'",
                obs_column_names=["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"],
                var_column_names=[]
            )

#Option 2: Using gget that calls the cellxgene census in the background

        adata = gget.cellxgene(
            species="homo_sapiens",
            gene=gene_set,  # if given [] it will download all genes ....
            tissue_general= ["{}".format(tissue_general)],
            census_version="2023-07-25",
            column_names=[
                "dataset_id",
                "assay",
                "suspension_type",
                "sex",
                "tissue_general",
                "tissue",
                "cell_type",
                "disease"
            ]
        )

I get Timeout error with all the options except lung for now

Cellxgene versions in python 3.10:


cellxgene                 0.16.7
cellxgene-census          1.15.0

Thank you in advance for the help! :)

@ivirshup
Copy link
Collaborator

Thanks for reporting the issue.

Can you show the full traceback along with the error? I'd like to see where the timeout is occurring.

@LysSanzMoreta
Copy link
Author

@ivirshup Thanks for the reply! Let me paste the trace when I am back from holidays, not long though

@LysSanzMoreta
Copy link
Author

@ivirshup I am back, here is the trace:

  File "/home/.../datasets.py", line 367, in download_cellxgene
    adata = cellxgene_census.get_anndata(
  File "/home/lys/.local/lib/python3.10/site-packages/cellxgene_census/_get_anndata.py", line 137, in get_anndata
    adata = query.to_anndata(
  File "/home/lys/.local/lib/python3.10/site-packages/somacore/query/query.py", line 299, in to_anndata
    return self._read(
  File "/home/lys/.local/lib/python3.10/site-packages/somacore/query/query.py", line 406, in _read
    x_matrices = {
  File "/home/lys/.local/lib/python3.10/site-packages/somacore/query/query.py", line 407, in <dictcomp>
    _xname: _fast_csr.read_csr(
  File "/home/lys/.local/lib/python3.10/site-packages/somacore/query/_fast_csr.py", line 34, in read_csr
    for tbl in _eager_iter.EagerIterator(
  File "/home/lys/.local/lib/python3.10/site-packages/somacore/query/_eager_iter.py", line 31, in __next__
    return self._preload_future.result()
  File "/home/lys/micromamba/envs/cellariumenv/lib/python3.10/concurrent/futures/_base.py", line 453, in result
    self._condition.wait(timeout)
  File "/home/lys/micromamba/envs/cellariumenv/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
  File "/home/lys/Dropbox/PostDoc_Glycomics/Laedingr/laedingr/src/laedingr/datasets.py", line 94, in handle_timeout
    raise TimeoutError
TimeoutError

@LysSanzMoreta
Copy link
Author

@ivirshup Hi! Any comments on this? Thanks!

@ivirshup
Copy link
Collaborator

Does the issue persist if you use a different network? E.g. trying this from home instead of an institute?

I've seen this error before when I had a misconfigured proxy, and another user saw something similar with the proxy on their compute cluster:

@LysSanzMoreta
Copy link
Author

@ivirshup Arg, I think I know what it might be, the VPN from the university. I will try to test it without it turned on (I forgot i had it on)

@LysSanzMoreta
Copy link
Author

@ivirshup Nevermind, the university VPN is off and still the same error. Might try in a different network later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants