Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipdSummary takes significant time but at a low CPU use #96

Open
ck-theory opened this issue Jun 21, 2023 · 0 comments
Open

ipdSummary takes significant time but at a low CPU use #96

ck-theory opened this issue Jun 21, 2023 · 0 comments

Comments

@ck-theory
Copy link

Hello,

I am currently using the following commands to run ipdSummary on our compute server and it is taking some samples 6+ days to run without completing. We have 2TB of RAM and 130 CPUs so I do not believe there is a resource limitation. I have the command set to run on 20 threads and I can see from the log files that for many of the running jobs, about ~15 of 20 threads have exited: "Process KineticWorkerProcess-9 (PID=1919193) done; exiting." However the few remaining processes are struggling along and only using a few CPUs. For example, I have 6 jobs running on 120 total CPUs, but only 47 are in use. During the assembly (Flye) all 120 CPUs were in use so it is not an issue with our queuing system.

Could you look at my ipdSummary command and give some pointers on ways to speed up this process? Any and all help is appreciated here! Thanks

  conda activate pbbam-2.1.0
  
  ccs-kinetics-bystrandify ${WD}/${SAMPLE}/pacbio/${SAMPLE}_pacbio.bam ${WD}/${SAMPLE}/kinetics/${SAMPLE}_pacbio_kinetics.bam
  
  conda activate smrtlink_11.0.0.146107
  cp ${WD}/${SAMPLE}/bakta/06.fixstart.fna ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta
  dataset create --generateIndices ${WD}/${SAMPLE}/kinetics/${SAMPLE}_referenceset.xml ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta
  
  pbmm2 align --sort ${WD}/${SAMPLE}/kinetics/${SAMPLE}_pacbio_kinetics.bam \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_referenceset.xml \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam
  
  pbindex ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam
  
  ipdSummary --numWorkers ${THREADS} \
    --reference ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta \
    --gff ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.gff3 \
    --bigwig ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.bigwig \
    --csv ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.csv \
    --identify m6A,m4C,m5C_TET \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant