Cleanup the management commands #128

timeu · 2017-09-14T08:15:08Z

Currently there are a lot of mangement commands:

compute_n_hits
generate_complete_csv
import_phenotypes
import_publication_links
import_sample_number
index_study
setup_es
submit_to_datacite

Some of them were workarounds to get the data in. We should remove those.
So far I think submit_to_datacite, setup_es, index_study, import_phenotypes are definitely required. Not sure about the others.

The import_phenotypes should have an option to update the phenotype information if they already exists.

mtog · 2017-09-14T08:32:36Z

The problem is that we don't have a single command to add a new study, how do we usually add them? I will fuse the instructions in compute_n_hits, import_publication_links and import_sample_number in one command and remove generate_complete_csv.

timeu · 2017-09-14T08:40:06Z

So I see it as follows:
We should have a import_phenotypes command that we can run by hand or as a cronjob that will go to AraPheno fetch the data, insert new phenotypes. I wouldn't want to update the existing ones, because otherwise we need to re-index all the associations. Usually also the data on AraPheno doesn't get updated once they are published. This will make sure that we allways have the published AraPheno phenotypes also in AraGWAS.
Eventually we should also have a cronjob that would run the GWAS pipeline for the new phenotypes (or if a new genotype is released for all the existing ones). But right now we will probably do this by hand.
So as you pointed out we probably need an endpoint that would take an hdf5 file and create a GWAS study that is connected to the phenotype and index the associations.

mtog · 2017-09-14T08:49:19Z

Ok, I will delete the other commands and create a new one for new studies (as proposed in #31 ). However we base all the current pipeline on the fact that studies, phenotypes and hdf5 files always carry the same id, can we keep this assumption for the future? (i.e. will the file be named 289.hdf5?)

timeu · 2017-09-14T08:54:34Z

No we can't. This is purely a coincidance because we currently have a 1-1 mapping between phenotypes and GWAS studies (1 transformation, 1 method and 1 genotype). As soon as we introduce either a new method or a new genotype version this does not uphold.
I would design the command that it takes the phenotype id, genotype id, method, transformation and a HDF5 file and creates a new GWAS study (id should be automatically assigned).

Left compute_n_hits for now so to update permutation thresholds once computations are done. Will delete it afterwards. #128

timeu added this to the Paper revision milestone Sep 14, 2017

timeu assigned mtog Sep 14, 2017

mtog pushed a commit that referenced this issue Sep 14, 2017

Cleaned up mgmt commands

01db3af

Left compute_n_hits for now so to update permutation thresholds once computations are done. Will delete it afterwards. #128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup the management commands #128

Cleanup the management commands #128

timeu commented Sep 14, 2017

mtog commented Sep 14, 2017

timeu commented Sep 14, 2017

mtog commented Sep 14, 2017

timeu commented Sep 14, 2017

Cleanup the management commands #128

Cleanup the management commands #128

Comments

timeu commented Sep 14, 2017

mtog commented Sep 14, 2017

timeu commented Sep 14, 2017

mtog commented Sep 14, 2017

timeu commented Sep 14, 2017