-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #29 from EnesSefaAyar/master
khan2023() correction and new datasets
- Loading branch information
Showing
9 changed files
with
449 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
##' Hu et al, 2023 (The Journal of Physical Chemistry B): Correlated protein modules | ||
##' | ||
##' @description | ||
##' | ||
##' They demonstrate the correlations between the levels of pairs of proteins | ||
##' in single-cell proteomics (SCP) at steady state. In measuring pairwise | ||
##' correlations among 1000 proteins in a population of K562 cells and oocytes, | ||
##' they observed many correlated protein modules (CPMs) that are functionally | ||
##' involved in certain biological functions. Certain CPMs are specific to a | ||
##' particular cell type, some common to different cell types. Additionally, | ||
##' compared to single-cell transcriptomics and bulk proteomics, | ||
##' protein correlations are functionally and experimentally more significant | ||
##' in SCP than those corresponding mRNAs. | ||
##' | ||
##' @format Two [SingleCellExperiment] objects: | ||
##' | ||
##' - `proteins_K562`: protein data containing quantitative data for 1249 | ||
##' proteins and 69 single-cells with zero imputation. | ||
##' - `proteins_oocyte`: protein data containing quantitative data for 3422 | ||
##' proteins and 137 single-cells with zero imputation. | ||
##' | ||
##' The `colData(hu2023_oocyte())` contains cell type annotation. | ||
##' The `colData(hu2023_K562())` contains cell type annotation. | ||
##' | ||
##' @section Acquisition protocol: | ||
##' | ||
##' The data were acquired using the following setup. More information | ||
##' can be found in the source article (see `References`). | ||
##' | ||
##' - **Cell isolation**: K562 cells were re-suspended and washed in cold PBS. | ||
##' Single cells/10 cells were sorted into 96-well plates using a FACSAria | ||
##' instrument. Oocyte-cumulus complexes from C57/6J mice were collected | ||
##' after PMSG and HCG injections, with hyaluronidase used to remove cumulus | ||
##' cells. All samples stored at -80 degrees Celsius. | ||
##' - **Sample preparation** Cells were digested with trypsin at 37 degrees | ||
##' Celsius for 3 hours. For label-free proteomics, digestion was terminated | ||
##' by adding 0.43% TFA and 1% ACN in water, followed by drying in a | ||
##' concentrator. Peptides were resuspended in 0.1% TFA and 1% ACN, and | ||
##' then transferred to sample tubes for LC-MS/MS analysis. | ||
##' - **Separation**: 4 microliters of peptide digests were injected into a | ||
##' high-performance chromatography column (IonOpticks) and separated at a | ||
##' flow rate of 100 nL/min using a nanoflow liquid chromatography system. | ||
##' The effective gradient was 70 mins, allowing 16 cells per day. | ||
##' - **Ionization**: Peptides were analyzed using an Orbitrap Eclipse mass | ||
##' spectrometer with a FAIMS Pro interface. FAIMS compensation voltages of | ||
##' −55 and −70 V were applied, with a 1-second cycle time for both voltages. | ||
##' - **Mass spectrometry**: MS spectra were acquired with the Orbitrap | ||
##' analyzer, while MS/MS spectra were acquired with a linear ion trap | ||
##' analyzer. The maximum ion injection time for MS/MS was 200 ms. | ||
##' - **Data analysis**: MS raw files were searched against the UniProt | ||
##' human protein database and an in-house contamination database | ||
##' using Proteome Discoverer(2.4). Label-free quantification was based on | ||
##' peak intensity with the match-between-runs (MBR) feature enabled. | ||
##' | ||
##' @section Data collection: | ||
##' | ||
##' The oocyte protein data shared by the author and it is accessible from the | ||
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d) | ||
##' The K563 protein data is accessible from the | ||
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata | ||
##' | ||
##' - `DataMatrix-oocyte-20240614.csv`: normalized imputed protein matrix | ||
##' - `ProteinAbundance.Rdata`: protein matrices (normalized, log transformed) | ||
##' | ||
##' We initialized an empty QFeatures object and added the corresponding | ||
##' protein assays as [SingleCellExperiment] objects. | ||
##' | ||
##' The oocyte protein data were exported from the shared link as | ||
##' (`DataMatrix-oocyte-20240614.csv`). The data were formatted to a | ||
##' [SingleCellExperiment] object and the SampleType information were added | ||
##' as only metadata, and stored in the `colData`. The object is then added | ||
##' to the [QFeatures] object. | ||
##' | ||
##' The 562 cells protein data were downloaded from the GitHub link and loaded | ||
##' to the memory. The `Norm` object were formatted to a [SingleCellExperiment] | ||
##' object and the SampleType information were added as only metadata, and | ||
##' stored in the `colData`. The object is then added to the [QFeatures] object. | ||
##' | ||
##' @source | ||
##' The oocyte data were downloaded from the | ||
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d) | ||
##' The K563 cells protein data downloaded from the | ||
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata | ||
##' The raw data and the quantification data can also be found in the | ||
##' MassIVE repository `MSV000089625`: | ||
##' ftp://[email protected]/. | ||
##' | ||
##' @references | ||
##' Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023. | ||
##' “Correlated protein modules revealing functional coordination of interacting | ||
##' proteins are detected by single-cell proteomics.”. The Journal of Physical | ||
##' Chemistry B, | ||
##' ([link to article](https://doi.org/10.1021/acs.jpcb.3c00014)). | ||
##' | ||
##' @aliases hu2023_K562 | ||
##' @aliases hu2023_oocyte | ||
##' | ||
##' @examples | ||
##' \donttest{ | ||
##' hu2023_oocyte() | ||
##' hu2023_K562() | ||
##' } | ||
##' | ||
##' @keywords datasets | ||
##' | ||
"hu2023" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
##' Krull et al, 2024 (Nature Communications): IFN-γ response | ||
##' | ||
##' They develop a new strategy for data-independent acquisition (DIA) that | ||
##' leverages the co-analysis of low-input samples alongside a corresponding | ||
##' enhancer (ME) of higher input. Using DIA-ME, they investigate the | ||
##' proteomic response of U-2 OS cells to interferon gamma (IFN-y) at | ||
##' the single-cell level. | ||
##' | ||
##' @format A [QFeatures] object with 159 assays, each assay being a | ||
##' [SingleCellExperiment] object. | ||
##' | ||
##' - Assay 1-158: DIA-NN main output report table split for each | ||
##' acquisition run. First 15 run acquires 10 single cells (MEs) and, | ||
##' remaining 143 run acquires 1 single cell. It contains the results | ||
##' of the spectrum identification and quantification. | ||
##' - `proteins`: DIA-NN protein group matrix, containing normalised | ||
##' quantities for 1553 protein groups in 143 single cells. Proteins | ||
##' are filtered at (Q.Value <= 0.01), (Lib.Q.Value <= 0.01), and | ||
##' (Lib.PG.Q.Value <= 0.01). | ||
##' | ||
##' The `colData(krull2024())` contains cell type annotations. The description | ||
##' of the `rowData` fields for the different assays can be found in the | ||
##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme). | ||
##' | ||
##' @section Acquisition protocol: | ||
##' | ||
##' The data were acquired using the following setup. More information | ||
##' can be found in the source article (see `References`). | ||
##' | ||
##' - **Cell isolation**: cells were detached with trypsin digestion, followed | ||
##' by dilution in 1.5 mL PBS, and isolated using BD FACSAria III instrument. | ||
##' - **Sample preparation**: Sorted single cells were collected in lysis | ||
##' buffer (50 mM TEAB, pH 8.5, and 0.025% DDM), denatured at 70 degrees | ||
##' Celsius for 30 minutes. Samples were acidified with 0.5% FA and | ||
##' transferred to auto sampler plates for mass spectrometry analysis. | ||
##' - **Separation**: Peptides were injected in a 2 microliter volume onto | ||
##' a (25 cm x 75 micrometer) ID column at a flow rate of 300 nL/min, | ||
##' separated using a gradient of ACN in water with 0.1% FA over 15 minutes, | ||
##' connected to a nano-ESI source. | ||
##' - **Ionization**: Ionization was performed using a 1,500 V capillary | ||
##' voltage with 3.0 L/min dry gas and a dry temperature of 180 degrees | ||
##' Celsius. MS data acquisition was conducted in diaPASEF mode using a | ||
##' timsTOF Pro mass spectrometer. | ||
##' - **Mass spectrometry**: MS1 scans covered a range of 200-1,700 m/z, | ||
##' while DIA window isolation targeted 475-1,000 m/z with eight DIA scans | ||
##' per cycle. Fragmentation was triggered by collision energy ranging from | ||
##' 45 eV to 27 eV depending on the ion mobility. | ||
##' - **Data analysis**: Data was processed using DIA-NN (v1.8.0) and | ||
##' Spectronaut 18 in a library-free approach, using deep learning | ||
##' for spectrum prediction, retention times, and ion mobility. | ||
##' | ||
##' @section Data collection: | ||
##' | ||
##' The data were collected from the PRIDE | ||
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464) | ||
##' in the `03_SingleCell_Searches.zip` file. | ||
##' | ||
##' We loaded the DIA-NN main report table and generated a sample | ||
##' annotation table based on the MS file names. We next combined the | ||
##' sample annotation and the DIANN tables into a [QFeatures] object | ||
##' following the `scp` data structure. We loaded the proteins group | ||
##' matrix as a [SingleCellExperiment] object, and added the protein data | ||
##' as a new assay and link the precursors to proteins using the | ||
##' `Protein.Group` variable from the `rowData`. | ||
##' | ||
##' @source | ||
##' The data were downloaded from PRIDE | ||
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464) | ||
##' with accession ID `PXD053464`. | ||
##' | ||
##' @references | ||
##' Krull, K. K., Ali, S. A., & Krijgsveld, J. 2024. "Enhanced feature matching | ||
##' in single-cell proteomics characterizes IFN-γ response and co-existence of | ||
##' Cell States." Nature Communications, 15(1). | ||
##' [Link to article](https://doi.org/10.1038/s41467-024-52605-x) | ||
##' | ||
##' @examples | ||
##' \donttest{ | ||
##' krull2024() | ||
##' } | ||
##' | ||
##' @keywords datasets | ||
##' | ||
"krull2024" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,3 +25,6 @@ | |
"guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA | ||
"petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA | ||
"petrosius2023_AstralAML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/4DSPJM",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_AstralAML.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA | ||
"krull2024","Single-cell proteomics data IFN-γ response of U-2 OS cells","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD053464",NA,"Homo sapiens",9606,TRUE,"PRIDE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/krull2024.Rda",2024-10-24,159,"DIA-NN","LFQ",TRUE,FALSE,TRUE,TRUE,NA | ||
"hu2023_K562","Single-cell proteomics data of K562 cells","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_K562.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA | ||
"hu2023_oocyte","Single-cell proteomics data of oocytes","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_oocyte.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
|
||
####---- Hu et al, 2023 ---#### | ||
|
||
|
||
## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023. | ||
## “Correlated protein modules revealing functional coordination of interacting | ||
## proteins are detected by single-cell proteomics.”. The Journal of Physical | ||
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014 | ||
|
||
library(SingleCellExperiment) | ||
library(scp) | ||
library(tidyverse) | ||
|
||
root <- "~/localdata/SCP/hu2023/" | ||
|
||
####---- Add the protein data ----#### | ||
|
||
## Data accessible at GitHub repository | ||
## https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata | ||
|
||
#### Load data #### | ||
load(paste0(root, "ProteinAbundance.Rdata")) | ||
|
||
Norm %>% | ||
mutate(X = rownames(Norm)) %>% | ||
readSingleCellExperiment(ecol = 1:69, fnames = "X") -> | ||
K562 | ||
|
||
## Protein data for K562 cells | ||
hu2023_K562 <- SingleCellExperiment(K562) | ||
|
||
prots <- rownames(hu2023_K562) | ||
rowData(hu2023_K562) <- Description[prots, ,drop = FALSE] | ||
rowData(hu2023_K562)$protein <- prots | ||
|
||
colData(hu2023_K562) <- DataFrame(row.names = colnames(Norm), | ||
SampleType = rep("K562", length(colnames(Norm)))) | ||
|
||
## Save data | ||
save(hu2023_K562, | ||
file = file.path(paste0(root, "hu2023_K562.Rda")), | ||
compress = "xz", | ||
compression_level = 9) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
|
||
####---- Hu et al, 2023 ---#### | ||
|
||
|
||
## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023. | ||
## “Correlated protein modules revealing functional coordination of interacting | ||
## proteins are detected by single-cell proteomics.”. The Journal of Physical | ||
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014 | ||
|
||
library(SingleCellExperiment) | ||
library(scp) | ||
library(tidyverse) | ||
|
||
root <- "~/localdata/SCP/hu2023/" | ||
|
||
####---- Add the protein data ----#### | ||
|
||
## Data shared by the author, and accessible at | ||
## https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?rtime=7Xzb4B303Eg | ||
|
||
#### Load Data #### | ||
oocyte <- read.csv(paste0(root, "DataMatrix-oocyte-20240614.csv")) | ||
oocyte %>% | ||
rename(protein = X) %>% | ||
readSingleCellExperiment(ecol = 2:138, fnames = "protein") -> | ||
oocyte | ||
|
||
## Protein data for oocytes | ||
hu2023_oocyte <- SingleCellExperiment(oocyte) | ||
|
||
colData(hu2023_oocyte) <- DataFrame(row.names = colnames(hu2023_oocyte), | ||
SampleType = rep("oocyte", length(colnames(oocyte)))) | ||
|
||
## Save data | ||
save(hu2023_oocyte, | ||
file = file.path(paste0(root, "hu2023_oocyte.Rda")), | ||
compress = "xz", | ||
compression_level = 9) | ||
|
Oops, something went wrong.