Skip to content

Commit

Permalink
Merge pull request #29 from EnesSefaAyar/master
Browse files Browse the repository at this point in the history
khan2023() correction and new datasets
  • Loading branch information
lgatto authored Oct 28, 2024
2 parents a2c8b03 + 35ccb02 commit 75e7436
Show file tree
Hide file tree
Showing 9 changed files with 449 additions and 21 deletions.
106 changes: 106 additions & 0 deletions R/hu2023.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
##' Hu et al, 2023 (The Journal of Physical Chemistry B): Correlated protein modules
##'
##' @description
##'
##' They demonstrate the correlations between the levels of pairs of proteins
##' in single-cell proteomics (SCP) at steady state. In measuring pairwise
##' correlations among 1000 proteins in a population of K562 cells and oocytes,
##' they observed many correlated protein modules (CPMs) that are functionally
##' involved in certain biological functions. Certain CPMs are specific to a
##' particular cell type, some common to different cell types. Additionally,
##' compared to single-cell transcriptomics and bulk proteomics,
##' protein correlations are functionally and experimentally more significant
##' in SCP than those corresponding mRNAs.
##'
##' @format Two [SingleCellExperiment] objects:
##'
##' - `proteins_K562`: protein data containing quantitative data for 1249
##' proteins and 69 single-cells with zero imputation.
##' - `proteins_oocyte`: protein data containing quantitative data for 3422
##' proteins and 137 single-cells with zero imputation.
##'
##' The `colData(hu2023_oocyte())` contains cell type annotation.
##' The `colData(hu2023_K562())` contains cell type annotation.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: K562 cells were re-suspended and washed in cold PBS.
##' Single cells/10 cells were sorted into 96-well plates using a FACSAria
##' instrument. Oocyte-cumulus complexes from C57/6J mice were collected
##' after PMSG and HCG injections, with hyaluronidase used to remove cumulus
##' cells. All samples stored at -80 degrees Celsius.
##' - **Sample preparation** Cells were digested with trypsin at 37 degrees
##' Celsius for 3 hours. For label-free proteomics, digestion was terminated
##' by adding 0.43% TFA and 1% ACN in water, followed by drying in a
##' concentrator. Peptides were resuspended in 0.1% TFA and 1% ACN, and
##' then transferred to sample tubes for LC-MS/MS analysis.
##' - **Separation**: 4 microliters of peptide digests were injected into a
##' high-performance chromatography column (IonOpticks) and separated at a
##' flow rate of 100 nL/min using a nanoflow liquid chromatography system.
##' The effective gradient was 70 mins, allowing 16 cells per day.
##' - **Ionization**: Peptides were analyzed using an Orbitrap Eclipse mass
##' spectrometer with a FAIMS Pro interface. FAIMS compensation voltages of
##' −55 and −70 V were applied, with a 1-second cycle time for both voltages.
##' - **Mass spectrometry**: MS spectra were acquired with the Orbitrap
##' analyzer, while MS/MS spectra were acquired with a linear ion trap
##' analyzer. The maximum ion injection time for MS/MS was 200 ms.
##' - **Data analysis**: MS raw files were searched against the UniProt
##' human protein database and an in-house contamination database
##' using Proteome Discoverer(2.4). Label-free quantification was based on
##' peak intensity with the match-between-runs (MBR) feature enabled.
##'
##' @section Data collection:
##'
##' The oocyte protein data shared by the author and it is accessible from the
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d)
##' The K563 protein data is accessible from the
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata
##'
##' - `DataMatrix-oocyte-20240614.csv`: normalized imputed protein matrix
##' - `ProteinAbundance.Rdata`: protein matrices (normalized, log transformed)
##'
##' We initialized an empty QFeatures object and added the corresponding
##' protein assays as [SingleCellExperiment] objects.
##'
##' The oocyte protein data were exported from the shared link as
##' (`DataMatrix-oocyte-20240614.csv`). The data were formatted to a
##' [SingleCellExperiment] object and the SampleType information were added
##' as only metadata, and stored in the `colData`. The object is then added
##' to the [QFeatures] object.
##'
##' The 562 cells protein data were downloaded from the GitHub link and loaded
##' to the memory. The `Norm` object were formatted to a [SingleCellExperiment]
##' object and the SampleType information were added as only metadata, and
##' stored in the `colData`. The object is then added to the [QFeatures] object.
##'
##' @source
##' The oocyte data were downloaded from the
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d)
##' The K563 cells protein data downloaded from the
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata
##' The raw data and the quantification data can also be found in the
##' MassIVE repository `MSV000089625`:
##' ftp://[email protected]/.
##'
##' @references
##' Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
##' “Correlated protein modules revealing functional coordination of interacting
##' proteins are detected by single-cell proteomics.”. The Journal of Physical
##' Chemistry B,
##' ([link to article](https://doi.org/10.1021/acs.jpcb.3c00014)).
##'
##' @aliases hu2023_K562
##' @aliases hu2023_oocyte
##'
##' @examples
##' \donttest{
##' hu2023_oocyte()
##' hu2023_K562()
##' }
##'
##' @keywords datasets
##'
"hu2023"
16 changes: 8 additions & 8 deletions R/khan2023.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
##' empty (negative control) channels and unused channels.
##' - `peptides`: peptide data containing quantitative data for 10055
##' peptides and 421 single-cells.
##' - `proteins_imputed`: protein data containing quantitative data for 4096
##' proteins and 421 single-cells with k-nearest neighbors (KNN) imputation.
##' - `proteins_unimputed`: protein data containing quantitative data for 4096
##' proteins and 421 single-cells without imputation.
##' - `proteins_imputed`: protein data containing quantitative data for 4571
##' proteins and 420 single-cells with k-nearest neighbors (KNN) imputation.
##' - `proteins_unimputed`: protein data containing quantitative data for 4571
##' proteins and 420 single-cells without imputation.
##'
##' The `colData(khan2023())` contains cell type and batch annotations that
##' are common to all assays. The description of the `rowData` fields for the
Expand Down Expand Up @@ -73,7 +73,7 @@
##' based on the peptide sequence information through an `AssayLink` object.
##'
##' The imputed protein data were taken from the same google drive folder
##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_imputedNotBC.csv`).
##' (`EpiToMesen.TGFB.nPoP_trial1_1PercDartFDRTMTBulkDIA.WallE_imputed.txt`).
##' The data were formatted to a [SingleCellExperiment] object and the sample
##' metadata were matched to the column names (mapping is retrieved
##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
Expand All @@ -82,7 +82,7 @@
##' based on the protein sequence information through an `AssayLink` object.
##'
##' The unimputed protein data were taken from the same google drive folder
##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_unimputed.csv`).
##' (`EpiToMesen.TGFB.nPoP_trial1_1PercDartFDRTMTBulkDIA.WallE_unimputed.txt`).
##' The data were formatted and added exactly as imputed data.
##'
##' @source
Expand All @@ -97,8 +97,8 @@
##' @references
##' Saad Khan, Rachel Conover, Anand R. Asthagiri, Nikolai Slavov. 2023.
##' "Dynamics of single-cell protein covariation during epithelial–mesenchymal
##' transition." bioRxiv.
##' ([link to article](https://doi.org/10.1101/2023.12.21.572913)).
##' transition." Journal of Proteome Research.
##' ([link to article](https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00277)).
##'
##' @examples
##' \donttest{
Expand Down
84 changes: 84 additions & 0 deletions R/krull2024.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
##' Krull et al, 2024 (Nature Communications): IFN-γ response
##'
##' They develop a new strategy for data-independent acquisition (DIA) that
##' leverages the co-analysis of low-input samples alongside a corresponding
##' enhancer (ME) of higher input. Using DIA-ME, they investigate the
##' proteomic response of U-2 OS cells to interferon gamma (IFN-y) at
##' the single-cell level.
##'
##' @format A [QFeatures] object with 159 assays, each assay being a
##' [SingleCellExperiment] object.
##'
##' - Assay 1-158: DIA-NN main output report table split for each
##' acquisition run. First 15 run acquires 10 single cells (MEs) and,
##' remaining 143 run acquires 1 single cell. It contains the results
##' of the spectrum identification and quantification.
##' - `proteins`: DIA-NN protein group matrix, containing normalised
##' quantities for 1553 protein groups in 143 single cells. Proteins
##' are filtered at (Q.Value <= 0.01), (Lib.Q.Value <= 0.01), and
##' (Lib.PG.Q.Value <= 0.01).
##'
##' The `colData(krull2024())` contains cell type annotations. The description
##' of the `rowData` fields for the different assays can be found in the
##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: cells were detached with trypsin digestion, followed
##' by dilution in 1.5 mL PBS, and isolated using BD FACSAria III instrument.
##' - **Sample preparation**: Sorted single cells were collected in lysis
##' buffer (50 mM TEAB, pH 8.5, and 0.025% DDM), denatured at 70 degrees
##' Celsius for 30 minutes. Samples were acidified with 0.5% FA and
##' transferred to auto sampler plates for mass spectrometry analysis.
##' - **Separation**: Peptides were injected in a 2 microliter volume onto
##' a (25 cm x 75 micrometer) ID column at a flow rate of 300 nL/min,
##' separated using a gradient of ACN in water with 0.1% FA over 15 minutes,
##' connected to a nano-ESI source.
##' - **Ionization**: Ionization was performed using a 1,500 V capillary
##' voltage with 3.0 L/min dry gas and a dry temperature of 180 degrees
##' Celsius. MS data acquisition was conducted in diaPASEF mode using a
##' timsTOF Pro mass spectrometer.
##' - **Mass spectrometry**: MS1 scans covered a range of 200-1,700 m/z,
##' while DIA window isolation targeted 475-1,000 m/z with eight DIA scans
##' per cycle. Fragmentation was triggered by collision energy ranging from
##' 45 eV to 27 eV depending on the ion mobility.
##' - **Data analysis**: Data was processed using DIA-NN (v1.8.0) and
##' Spectronaut 18 in a library-free approach, using deep learning
##' for spectrum prediction, retention times, and ion mobility.
##'
##' @section Data collection:
##'
##' The data were collected from the PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464)
##' in the `03_SingleCell_Searches.zip` file.
##'
##' We loaded the DIA-NN main report table and generated a sample
##' annotation table based on the MS file names. We next combined the
##' sample annotation and the DIANN tables into a [QFeatures] object
##' following the `scp` data structure. We loaded the proteins group
##' matrix as a [SingleCellExperiment] object, and added the protein data
##' as a new assay and link the precursors to proteins using the
##' `Protein.Group` variable from the `rowData`.
##'
##' @source
##' The data were downloaded from PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464)
##' with accession ID `PXD053464`.
##'
##' @references
##' Krull, K. K., Ali, S. A., & Krijgsveld, J. 2024. "Enhanced feature matching
##' in single-cell proteomics characterizes IFN-γ response and co-existence of
##' Cell States." Nature Communications, 15(1).
##' [Link to article](https://doi.org/10.1038/s41467-024-52605-x)
##'
##' @examples
##' \donttest{
##' krull2024()
##' }
##'
##' @keywords datasets
##'
"krull2024"
3 changes: 3 additions & 0 deletions inst/extdata/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@
"guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_AstralAML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/4DSPJM",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_AstralAML.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"krull2024","Single-cell proteomics data IFN-γ response of U-2 OS cells","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD053464",NA,"Homo sapiens",9606,TRUE,"PRIDE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/krull2024.Rda",2024-10-24,159,"DIA-NN","LFQ",TRUE,FALSE,TRUE,TRUE,NA
"hu2023_K562","Single-cell proteomics data of K562 cells","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_K562.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA
"hu2023_oocyte","Single-cell proteomics data of oocytes","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_oocyte.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA
43 changes: 43 additions & 0 deletions inst/scripts/make-data_hu2023_K562.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

####---- Hu et al, 2023 ---####


## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
## “Correlated protein modules revealing functional coordination of interacting
## proteins are detected by single-cell proteomics.”. The Journal of Physical
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014

library(SingleCellExperiment)
library(scp)
library(tidyverse)

root <- "~/localdata/SCP/hu2023/"

####---- Add the protein data ----####

## Data accessible at GitHub repository
## https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata

#### Load data ####
load(paste0(root, "ProteinAbundance.Rdata"))

Norm %>%
mutate(X = rownames(Norm)) %>%
readSingleCellExperiment(ecol = 1:69, fnames = "X") ->
K562

## Protein data for K562 cells
hu2023_K562 <- SingleCellExperiment(K562)

prots <- rownames(hu2023_K562)
rowData(hu2023_K562) <- Description[prots, ,drop = FALSE]
rowData(hu2023_K562)$protein <- prots

colData(hu2023_K562) <- DataFrame(row.names = colnames(Norm),
SampleType = rep("K562", length(colnames(Norm))))

## Save data
save(hu2023_K562,
file = file.path(paste0(root, "hu2023_K562.Rda")),
compress = "xz",
compression_level = 9)
39 changes: 39 additions & 0 deletions inst/scripts/make-data_hu2023_oocyte.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

####---- Hu et al, 2023 ---####


## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
## “Correlated protein modules revealing functional coordination of interacting
## proteins are detected by single-cell proteomics.”. The Journal of Physical
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014

library(SingleCellExperiment)
library(scp)
library(tidyverse)

root <- "~/localdata/SCP/hu2023/"

####---- Add the protein data ----####

## Data shared by the author, and accessible at
## https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?rtime=7Xzb4B303Eg

#### Load Data ####
oocyte <- read.csv(paste0(root, "DataMatrix-oocyte-20240614.csv"))
oocyte %>%
rename(protein = X) %>%
readSingleCellExperiment(ecol = 2:138, fnames = "protein") ->
oocyte

## Protein data for oocytes
hu2023_oocyte <- SingleCellExperiment(oocyte)

colData(hu2023_oocyte) <- DataFrame(row.names = colnames(hu2023_oocyte),
SampleType = rep("oocyte", length(colnames(oocyte))))

## Save data
save(hu2023_oocyte,
file = file.path(paste0(root, "hu2023_oocyte.Rda")),
compress = "xz",
compression_level = 9)

Loading

0 comments on commit 75e7436

Please sign in to comment.