Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup the management commands #128

Open
timeu opened this issue Sep 14, 2017 · 4 comments
Open

Cleanup the management commands #128

timeu opened this issue Sep 14, 2017 · 4 comments
Assignees

Comments

@timeu
Copy link
Contributor

timeu commented Sep 14, 2017

Currently there are a lot of mangement commands:

  • compute_n_hits
  • generate_complete_csv
  • import_phenotypes
  • import_publication_links
  • import_sample_number
  • index_study
  • setup_es
  • submit_to_datacite

Some of them were workarounds to get the data in. We should remove those.
So far I think submit_to_datacite, setup_es, index_study, import_phenotypes are definitely required. Not sure about the others.

The import_phenotypes should have an option to update the phenotype information if they already exists.

@timeu timeu added this to the Paper revision milestone Sep 14, 2017
@mtog
Copy link
Contributor

mtog commented Sep 14, 2017

The problem is that we don't have a single command to add a new study, how do we usually add them? I will fuse the instructions in compute_n_hits, import_publication_links and import_sample_number in one command and remove generate_complete_csv.

@timeu
Copy link
Contributor Author

timeu commented Sep 14, 2017

So I see it as follows:
We should have a import_phenotypes command that we can run by hand or as a cronjob that will go to AraPheno fetch the data, insert new phenotypes. I wouldn't want to update the existing ones, because otherwise we need to re-index all the associations. Usually also the data on AraPheno doesn't get updated once they are published. This will make sure that we allways have the published AraPheno phenotypes also in AraGWAS.
Eventually we should also have a cronjob that would run the GWAS pipeline for the new phenotypes (or if a new genotype is released for all the existing ones). But right now we will probably do this by hand.
So as you pointed out we probably need an endpoint that would take an hdf5 file and create a GWAS study that is connected to the phenotype and index the associations.

@mtog
Copy link
Contributor

mtog commented Sep 14, 2017

Ok, I will delete the other commands and create a new one for new studies (as proposed in #31 ). However we base all the current pipeline on the fact that studies, phenotypes and hdf5 files always carry the same id, can we keep this assumption for the future? (i.e. will the file be named 289.hdf5?)

@timeu
Copy link
Contributor Author

timeu commented Sep 14, 2017

No we can't. This is purely a coincidance because we currently have a 1-1 mapping between phenotypes and GWAS studies (1 transformation, 1 method and 1 genotype). As soon as we introduce either a new method or a new genotype version this does not uphold.
I would design the command that it takes the phenotype id, genotype id, method, transformation and a HDF5 file and creates a new GWAS study (id should be automatically assigned).

mtog pushed a commit that referenced this issue Sep 14, 2017
Left compute_n_hits for now so to update permutation thresholds once computations are done. Will delete it afterwards. #128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants