The HASCAD is a cell composition deconvolution model to predict the 15 immune cell abundance from RNA-seq data, which the emdenble depp learning model trained on three PBMC scRNA-seq. We utilize the Harmony and Symphony to do pre-processing and remove batch effects between scRNA-seq to build the reference data.
While you prepare your gene expression matrix, you should check if the sort of genes is as same as the reference genes. You can follow the example file for your query.
You can run the main.ipynb
and get a result.
And, your can modify this script to replace the file "Example.csv".
sample = pd.read_csv("../Source/Example.csv",header=None)
Run Harmony-Symphony/HS_main.R
And, your can modify this script to replace the file "Example.csv".
sample = pd.read_csv("Harmony-Symphony/hs_exmple_output.csv",header=None)
The gene expression without/with Symphony-Harmony.
Then you can run the script and obtain a plot like this
The two steps in this section. The first is that you will prepare your reference data and query data. The second is that the HASCAD trained by the reference data to predict the cell composition of query data.
make preparations
R version 4.1.0
irlba 2.3.5
See the requirement_python.txt
Under review on BMC journal