Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89

koefoeden · 2023-11-14T16:25:52Z

Hi - thanks for a nice tool! I have a quick question regarding the best approach for using cellsnp/vireo in our experimental setup:
We have generated data from 8 separate 10x Genomics single-cell reactions, which are all made from pools of the same 8 donors.

My idea is that by pooling the .bam files (and making sure that the barcodes are differentiated appropriately to avoid clashing) before demultiplexing, the demultiplexing will have access to 8x sequencing depth per individual. However, the sequencing depth per cell will of course stay the same, as we have 8x reads and 8x cells. My question is thus whether there is some kind of information sharing between the cells, which enables a better demultiplexing from merely having more cells from the same number of possible donors.

I hope my question makes sense!

huangyh09 · 2023-11-15T03:35:18Z

Hi,

Good idea - it may enhance the estimation of the donors' genotype and link them across samples. On the other hand, as you mentioned the coverage for each cell won't change; therefore, it is usually unnecessary to pool multiple samples if the n_cells in each pool is reasonable (e.g., >3K).

Yuanhua

koefoeden · 2023-11-15T09:12:57Z

Yes, so that was my thought - I have prepared the pooled .bam-files, so will go ahead and try it and let you know the results. Do you have a feeling which mode of cellsnp-lite would yield the most improvement when doing this, i.e. SNP-based or chromosome-pileup? And how the --forcelearnGT parameter in Vireo might play in here?

mdmanurung · 2024-01-14T18:50:53Z

@koefoeden Did it go well? Do you have an example script to merge the bam files and rewriting the barcode IDs? Thanks in advance!

koefoeden · 2024-01-15T09:58:29Z

@mdmanurung

It did not seem to change the results a lot, so we never went further with it. But in case you want to try it out, here are the two scripts I wrote.

The first uses awk, sed and samtools to modify the barcodes.tsv.gz and atac_possorted_bam.bam files (and save them as new files) in the cellranger-arc output directory, i.e. argument 1, to instead be suffixed by the given $number parameter, i.e. argument 2, to avoid barcode clashing:

#!/bin/bash
subdir=$1
subdir_base_name=$(basename $subdir)
number=$2

# **** 1) Edit the barcodes.tsv.gz file ****
echo "Processing barcodes.tsv.gz file for $subdir"
gzip -dc "$subdir/outs/filtered_feature_bc_matrix/barcodes.tsv.gz" | \
awk -v num="$number" '{split($0, a, "-"); print a[1]"-"num}' | \
gzip > "$subdir/outs/filtered_feature_bc_matrix/barcodes_modified.tsv.gz"

# **** 2) Edit the atac_possorted_bam.bam file ****
echo "Processing atac_possorted_bam.bam file for $subdir"
samtools view -h "$subdir/outs/atac_possorted_bam.bam" | \
sed -E "s/(CB:Z:[A-Z]{16}-)1/\1${number}/g" | \
samtools view -b -o "$subdir/outs/atac_possorted_bam_modified.bam" -
echo "Everything completed for $subdir"

The second script basically merges the modified barcode and bam files, assuming that each cellranger-output directoy is labelled 1, 2, 3, 4, ,5, 6, 7, 8. in $cellranger_outs_dir, i.e. the first argument.

cellranger_outs_dir=$1
combined_out_dir=$2

mkdir -p ${combined_out_dir}

# Concatenate barcodes.tsv.gz files
 zcat "$cellranger_outs_dir"/{1..8}/outs/filtered_feature_bc_matrix/barcodes_modified.tsv.gz > ${combined_out_dir}/barcodes.tsv
gzip ${combined_out_dir}/barcodes.tsv

# Concatenate atac_possorted_bam.bam files
samtools merge ${cellranger_outs_dir}/{1..8}/outs/atac_possorted_bam_modified.bam \
-o ${combined_out_dir}/atac_possorted_bam.bam \
--write-index

And then finally it should be possible to use Vireo/cellSNP as normal on these files.

ghuls · 2024-01-26T21:22:04Z

If you want something faster, you can use bam_update_cell_barcode from https://github.com/aertslab/single_cell_toolkit/:

git clone https://github.com/aertslab/single_cell_toolkit/
cd single_cell_toolkit/rust

cargo build --bin bam_update_cell_barcode

./bam_update_cell_barcode 
Usage: bam_update_cell_barcode input.bam output.bam barcode_mapping.tsv old_bc_tag new_bc_tag


Where the barcode_mapping.tsv looks like this (old CB "\t" new CB):
AAACCTGAGAAACCAT-1      AAACCTGAGAAACCAT-2
AAACCTGAGAAACCGC-1      AAACCTGAGAAACCGC-2
AAACCTGAGAAACCTA-1      AAACCTGAGAAACCTA-2
AAACCTGAGAAACGAG-1      AAACCTGAGAAACGAG-2
AAACCTGAGAAACGCC-1      AAACCTGAGAAACGCC-2
AAACCTGAGAAAGTGG-1      AAACCTGAGAAAGTGG-2
AAACCTGAGAACAACT-1      AAACCTGAGAACAACT-2
AAACCTGAGAACAATC-1      AAACCTGAGAACAATC-2
AAACCTGAGAACTCGG-1      AAACCTGAGAACTCGG-2
AAACCTGAGAACTGTA-1      AAACCTGAGAACTGTA-2

And old_bc_tag: "CB" (or whatever tag contains the current barcode.
And new_bc_tag the tag to which you want to write the new barcode: e.g.: "CB" to overwrite, or "de" to write the new barcode to a new tag.

Changing the previous script a bit:

#!/bin/bash
subdir=$1
subdir_base_name=$(basename $subdir)
number=$2

# **** 1) Edit the barcodes.tsv.gz file ****
echo "Processing barcodes.tsv.gz file for $subdir"
gzip -dc "$subdir/outs/filtered_feature_bc_matrix/barcodes.tsv.gz" | \
awk -v num="$number" '{split($0, a, "-"); print a[1]"-"num}' | \
gzip > "$subdir/outs/filtered_feature_bc_matrix/barcodes_modified.tsv.gz"

# **** 1) Create the barcode_mapping.tsv file ****
paste \
    <(gzip -dc "$subdir/outs/filtered_feature_bc_matrix/barcodes.tsv.gz") \
    <(gzip -dc "$subdir/outs/filtered_feature_bc_matrix/barcodes_modified.tsv.gz") \
 > "$subdir/outs/filtered_feature_bc_matrix/barcodes_mapping.tsv"

# **** 3) Edit the atac_possorted_bam.bam file with bam_update_cell_barcode ****
echo "Processing atac_possorted_bam.bam file for $subdir"

bam_update_cell_barcode \
    "$subdir/outs/atac_possorted_bam.bam" \
    "$subdir/outs/atac_possorted_bam_modified.bam" \
    "$subdir/outs/filtered_feature_bc_matrix/barcodes_mapping.tsv" \
    CB \
    CB

echo "Everything completed for $subdir"

Modifying cell barcodes in a 6.3G BAM file took 164 seconds

koefoeden · 2024-01-26T22:29:36Z

@ghuls

ah, great - thanks for the tip! :)

mdmanurung · 2024-02-01T20:15:57Z

Thanks all for the tip!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89

Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89

koefoeden commented Nov 14, 2023

huangyh09 commented Nov 15, 2023

koefoeden commented Nov 15, 2023

mdmanurung commented Jan 14, 2024

koefoeden commented Jan 15, 2024 •

edited

Loading

ghuls commented Jan 26, 2024

koefoeden commented Jan 26, 2024

mdmanurung commented Feb 1, 2024

Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89

Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89

Comments

koefoeden commented Nov 14, 2023

huangyh09 commented Nov 15, 2023

koefoeden commented Nov 15, 2023

mdmanurung commented Jan 14, 2024

koefoeden commented Jan 15, 2024 • edited Loading

ghuls commented Jan 26, 2024

koefoeden commented Jan 26, 2024

mdmanurung commented Feb 1, 2024

koefoeden commented Jan 15, 2024 •

edited

Loading