Brainstorming ideas for CITE-seq dataset

Processing & Data Distribution

Copying / Quantifying dropout
- per gene bias
- relationship between transcripts vs marker and observed association
- concordance of expected transcript frequency vs. observed
QC
- min #counts /sample
- normalization
Sets of transcripts in all cells

Clustering

Unsupervised

subclustering
- ADT, then RNA, or vice versa
dimensionality reduction
- PCA / MDS / t-SNE / SIMILIR
- NNMF
- Autoencoder

Supervised

random forest / ensemble methods
others

Visualization

Interactive displa (html/browser based for audience engagement)
traditoinal plots

Interpretation

biological significance
etc?