-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploying to gh-pages from @ f99964c 🚀
- Loading branch information
Showing
50 changed files
with
2,460 additions
and
967 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
310 changes: 310 additions & 0 deletions
310
_sources/notebooks/api_demo/census_citation_generation.ipynb.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,310 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "88812eae-6b46-48b4-a1e4-c468657d8480", | ||
"metadata": {}, | ||
"source": [ | ||
"# Generating citations for Census slices\n", | ||
"\n", | ||
"This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.\n", | ||
"\n", | ||
"**Contents**\n", | ||
"\n", | ||
"1. Requirements\n", | ||
"1. Generating citation strings\n", | ||
" 1. Via cell metadata query\n", | ||
" 1. Via an AnnData query \n", | ||
"\n", | ||
"⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).\n", | ||
"\n", | ||
"## Requirements\n", | ||
"\n", | ||
"This notebook requires:\n", | ||
"\n", | ||
"- `cellxgene_census` Python package.\n", | ||
"- Census data release with [schema version](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md) 1.3.0 or greater.\n", | ||
"\n", | ||
"## Generating citation strings\n", | ||
"\n", | ||
"First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use `census_version=\"latest\"`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "9a5a5a92-3d78-4542-95a5-e6889f245491", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<div>\n", | ||
"<style scoped>\n", | ||
" .dataframe tbody tr th:only-of-type {\n", | ||
" vertical-align: middle;\n", | ||
" }\n", | ||
"\n", | ||
" .dataframe tbody tr th {\n", | ||
" vertical-align: top;\n", | ||
" }\n", | ||
"\n", | ||
" .dataframe thead th {\n", | ||
" text-align: right;\n", | ||
" }\n", | ||
"</style>\n", | ||
"<table border=\"1\" class=\"dataframe\">\n", | ||
" <thead>\n", | ||
" <tr style=\"text-align: right;\">\n", | ||
" <th></th>\n", | ||
" <th>soma_joinid</th>\n", | ||
" <th>label</th>\n", | ||
" <th>value</th>\n", | ||
" </tr>\n", | ||
" </thead>\n", | ||
" <tbody>\n", | ||
" <tr>\n", | ||
" <th>0</th>\n", | ||
" <td>0</td>\n", | ||
" <td>census_schema_version</td>\n", | ||
" <td>1.3.0</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>1</th>\n", | ||
" <td>1</td>\n", | ||
" <td>census_build_date</td>\n", | ||
" <td>2024-01-01</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>2</th>\n", | ||
" <td>2</td>\n", | ||
" <td>dataset_schema_version</td>\n", | ||
" <td>4.0.0</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>3</th>\n", | ||
" <td>3</td>\n", | ||
" <td>total_cell_count</td>\n", | ||
" <td>75694072</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>4</th>\n", | ||
" <td>4</td>\n", | ||
" <td>unique_cell_count</td>\n", | ||
" <td>45846761</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>5</th>\n", | ||
" <td>5</td>\n", | ||
" <td>number_donors_homo_sapiens</td>\n", | ||
" <td>16292</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>6</th>\n", | ||
" <td>6</td>\n", | ||
" <td>number_donors_mus_musculus</td>\n", | ||
" <td>2153</td>\n", | ||
" </tr>\n", | ||
" </tbody>\n", | ||
"</table>\n", | ||
"</div>" | ||
], | ||
"text/plain": [ | ||
" soma_joinid label value\n", | ||
"0 0 census_schema_version 1.3.0\n", | ||
"1 1 census_build_date 2024-01-01\n", | ||
"2 2 dataset_schema_version 4.0.0\n", | ||
"3 3 total_cell_count 75694072\n", | ||
"4 4 unique_cell_count 45846761\n", | ||
"5 5 number_donors_homo_sapiens 16292\n", | ||
"6 6 number_donors_mus_musculus 2153" | ||
] | ||
}, | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import cellxgene_census\n", | ||
"\n", | ||
"census = cellxgene_census.open_soma(census_version=\"latest\")\n", | ||
"census[\"census_info\"][\"summary\"].read().concat().to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "23174644-7804-4723-a4ab-c5cf75bdd954", | ||
"metadata": {}, | ||
"source": [ | ||
"Then we load the dataset table which contains a column `\"citation\"` for each dataset included in Census. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "d47b636a-d653-4e3b-b139-14b6ca697ce8", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"0 Dataset Version: https://datasets.cellxgene.cz...\n", | ||
"1 Dataset Version: https://datasets.cellxgene.cz...\n", | ||
"2 Dataset Version: https://datasets.cellxgene.cz...\n", | ||
"3 Dataset Version: https://datasets.cellxgene.cz...\n", | ||
"4 Publication: https://doi.org/10.1002/ctm2.1356...\n", | ||
" ... \n", | ||
"695 Publication: https://doi.org/10.1038/s41586-02...\n", | ||
"696 Publication: https://doi.org/10.1038/s41586-02...\n", | ||
"697 Publication: https://doi.org/10.1016/j.isci.20...\n", | ||
"698 Publication: https://doi.org/10.1371/journal.p...\n", | ||
"699 Publication: https://doi.org/10.1016/j.isci.20...\n", | ||
"Name: citation, Length: 700, dtype: object" | ||
] | ||
}, | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"datasets = census[\"census_info\"][\"datasets\"].read().concat().to_pandas()\n", | ||
"datasets[\"citation\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "06adfa4a-3656-4f26-9adf-ba28eb2f691e", | ||
"metadata": {}, | ||
"source": [ | ||
"And now we can use the column `\"dataset_id\"` present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.\n", | ||
"\n", | ||
"### Via cell metadata query" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "f7edf4a7-8394-4df2-9dde-b24efcd6dbe0", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Query cell metadata\n", | ||
"cell_metadata = census[\"census_data\"][\"homo_sapiens\"].obs.read(\n", | ||
" value_filter=\"tissue == 'cardiac atrium'\", column_names=[\"dataset_id\", \"cell_type\"]\n", | ||
")\n", | ||
"cell_metadata = cell_metadata.concat().to_pandas()\n", | ||
"\n", | ||
"# Get a citation string for the slice\n", | ||
"slice_datasets = datasets[datasets[\"dataset_id\"].isin(cell_metadata[\"dataset_id\"])]\n", | ||
"print(*slice_datasets[\"citation\"], sep=\"\\n\\n\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "941c6f2d-6bc2-46c3-963e-e74335fe93f6", | ||
"metadata": {}, | ||
"source": [ | ||
"### Via AnnData query" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "2d9b2a11-2f48-43a5-8955-759019ce6bed", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/4866a804-37eb-436f-8c87-9cd585260061.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/bfd80f12-725c-4482-ad7f-1ed2b4909b0d.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/e6df8a57-f54f-413a-9d4d-dee03294d778.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8d599205-5c51-4b50-9d48-3dec31238587.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/f6065c51-bd26-4aa5-a05d-2805aeea48d9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n", | ||
"\n", | ||
"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8cdbf790-4d29-4f46-9aef-21adfb2e21da.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Fetch an AnnData object\n", | ||
"adata = cellxgene_census.get_anndata(\n", | ||
" census=census,\n", | ||
" organism=\"homo_sapiens\",\n", | ||
" measurement_name=\"RNA\",\n", | ||
" obs_value_filter=\"tissue == 'cardiac atrium'\",\n", | ||
" var_value_filter=\"feature_name == 'MYBPC3'\",\n", | ||
" column_names={\"obs\": [\"dataset_id\", \"cell_type\"]},\n", | ||
")\n", | ||
"\n", | ||
"# Get a citation string for the slice\n", | ||
"slice_datasets = datasets[datasets[\"dataset_id\"].isin(adata.obs[\"dataset_id\"])]\n", | ||
"print(*slice_datasets[\"citation\"], sep=\"\\n\\n\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f6988186-5294-43f9-bfe5-2ac255aa0b26", | ||
"metadata": {}, | ||
"source": [ | ||
"And don't forget to close the Census handle" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "f96b1c3b-4a2a-469a-9ded-1c5ff98b84aa", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"census.close()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.