Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The train step needs large memory #51

Open
jiangzy26 opened this issue Mar 29, 2024 · 13 comments
Open

The train step needs large memory #51

jiangzy26 opened this issue Mar 29, 2024 · 13 comments

Comments

@jiangzy26
Copy link

Hi,
When I run the RELERNN_TRAIN with default settings, the step was killed because of the large memory, how to deal with this? could you share your help? Thank you very much.

@jiangzy26
Copy link
Author

Or, Could I use the same command line to simulate each chromosome, and Train each chromosome to get the predicted result?

@andrewkern
Copy link
Member

how much memory are you using?

@willright28
Copy link

Hi, @andrewkern and all users,

I had the same question. The training step was fine when no demography history was set, but it took up to 3 T or more of space (no tested but killed) when demography history was set.

My code is:
ReLERNN/ReLERNN_SIMULATE
-v A.phased.vcf \ #only filter missing snp --max-missing 0.9 using vcftools
--phased
-g fasta.fai
-d ./A
-n A_two-epoch_unfold.final.summary \ #output of stairwayplot2
-l 1
-m reference.missing.bed
-t 80

Any clue on how to solve this?

Thanks in advance!

Best,
willright28

@andrewkern
Copy link
Member

hi @willright28 -- what does your demographic history look like? I'm guessing a contracting population size moving to the present?

@willright28
Copy link

Hi @andrewkern
Thanks for your reply. The demographic history looks like this, a strong bottleneck then a rebound:
pop-his

@andrewkern
Copy link
Member

Just so I'm oriented here-- is the y-axis correct? you have Ne going down to 0.1? Are these in relative units?

@willright28
Copy link

Sorry for the misleading, the y-axis is log-transformed. The lowest Ne is ~1300.

@andrewkern
Copy link
Member

Okay thanks for the clarification. I'm looking at your code-- it looks like your -g flag is pointing to a .fai file-- is that a indexed fasta or a bed file?

My code is: ReLERNN/ReLERNN_SIMULATE -v A.phased.vcf \ #only filter missing snp --max-missing 0.9 using vcftools --phased -g fasta.fai -d ./A -n A_two-epoch_unfold.final.summary \ #output of stairwayplot2 -l 1 -m reference.missing.bed -t 80

@willright28
Copy link

The .fai file is a bed file formated as chr1 0 100000 (length of chr)

@willright28
Copy link

Maybe I can set --maxsites to a reasonable number to avoid this problem?

@andrewkern
Copy link
Member

this shouldn't be too big. can you provide for me your input files and I can poke around

@willright28
Copy link

I have tested to run a single chromosome, chr1, which is ~ 10% of the whole genome. The program worked fine and only took ~ 0.1 T memory.
Is running each chromosome separately and using the same demographic history in the -n parameter ok?

@andrewkern
Copy link
Member

yes it should be fine to run each chromosome separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants