Error with python prepare_C2M2_submission.py. Phenotype. HP:0025464 doesn't have genes associated with it #357
-
I am getting an error with: python prepare_C2M2_submission.py Looks like this is with Phenotype. HP:0025464 doesn't have genes associated with it, even though this term is in HPO obo file. Should I remove HP:0025464. Not sure this error might arise with other HPO terms as well. Human Phenotype Ontology... [Tue Apr 12 11:07:54 PDT 2022] |
Beta Was this translation helpful? Give feedback.
Replies: 23 comments 1 reply
-
Indeed, HP:0025464 does't have any genes associated with it, so it is not present in hp.phenotype_to_genes.txt or HPO_Entrez_gene_IDs_to_EnsEMBL_IDs.tsv. But, who knows, maybe in the future it will be. Instead of deleting this entry or others like it, could we add NAs to the gene and ensemble id fields of those files to avoid the error? |
Beta Was this translation helpful? Give feedback.
-
I would love to see that I don't need to delete it from table and the issue be handled by the script prepare_C2M2_submission.py in conjunction with the C2M2 json schema. |
Beta Was this translation helpful? Give feedback.
-
This is a script error. Fixing, stand by for an update to OSF |
Beta Was this translation helpful? Give feedback.
-
@mano-at-sdsc This was caused by a bug due to a python block that was incorrectly indented. I've published a fixed version to https://osf.io/bq6k9/ -- please get the updated script and try again. The error should no longer occur. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot. It worked. After adding several header-only files, frictionless could validate all but one file. phenotype_gene.tsv error is like invalid: phenotype_gene.tsv-------==== ===== ================= ============================================================================== I checked line 921 is: I checked HP:0000501 is listed in file phenotype.tsv [auto-generated], but gene ENSG00000288784 is not in file gene.tsv [auto-generated]. I had downloaded external ref zip file a few hours back. Please let me know where I might have left out something. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Progress! Investigating new thing, stand by. |
Beta Was this translation helpful? Give feedback.
-
Dear Arthur: Any progress on this. If there an updated py script/json or external ref files that I should download and retry. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Sorry, got swamped by another request, nearly resolved here, stay tuned |
Beta Was this translation helpful? Give feedback.
-
Found it. The link between the phenotype and the given gene is loading correctly, but the target gene isn't in my main Ensembl reference, which is a year old. Let me update and reprocess, shouldn't take too long, watch this space for updates |
Beta Was this translation helpful? Give feedback.
-
@mano-at-sdsc Please re-download only |
Beta Was this translation helpful? Give feedback.
-
Thanks. I can only imagine how busy you must be responding to questions from all DCCs. invalid: phenotype_gene.tsvrow field code message If needed, you can look at the file phenotype_gene.tsv here: phenotype_gene.tsv I also extracted the relevant lines from phenotype_gene.tsv (lists ENSG not found in ensembl_genes.tsv): |
Beta Was this translation helpful? Give feedback.
-
@mano-at-sdsc :: Okay, try this -- re-download only |
Beta Was this translation helpful? Give feedback.
-
@abradyIGS Thanks. Awesome. Validated. Will proceed with submission. |
Beta Was this translation helpful? Give feedback.
-
Incidentally @mano-at-sdsc I just updated the main gene ref with the big alias file you sent a few weeks ago (sorry: it dropped off my radar until early this morning). i recommend getting the new |
Beta Was this translation helpful? Give feedback.
-
@abradyIGS The file ensembl_genes.tsv is just 1.1kb and seems to have binary characters. Please check and let me know. |
Beta Was this translation helpful? Give feedback.
-
MacOS alias/upload error, sorry -- replaced just now, please try again. Correct size should be 9,377,130 bytes. |
Beta Was this translation helpful? Give feedback.
-
Tried. Getting this error when trying to validate with frictionless in gene.tsv: -------invalid: gene.tsv-------==== ===== ========== |
Beta Was this translation helpful? Give feedback.
-
Thanks -- I see the problem, fixing, stand by Sorry, lot of hotfixes this week |
Beta Was this translation helpful? Give feedback.
-
Have pushed an updated copy (v5, now). Please retry and let me know ASAP if any other errors are encountered. |
Beta Was this translation helpful? Give feedback.
-
Thanks. Validated and have submitted. Will review on the portal once digested. I assume this would not have affected any other sections such as files, biosample, etc., which I had reviewed for yesterday's submission. |
Beta Was this translation helpful? Give feedback.
-
Assumption correct -- glad to hear it worked at last. |
Beta Was this translation helpful? Give feedback.
-
@abradyIGS Thanks a lot for fixing these things instantly. Submission seems to have gone through just fine, reviewed several things randomly. For gene search, on the portal (content yet not approved), it doesn't show other ID types, e.g., Entrez ID or REFSEQ, but I suppose that is just the current display that will change in the future since all the information is captured in gene.tsv thanks to all your hard work. |
Beta Was this translation helpful? Give feedback.
-
Actually it was able to search by synonym [e.g., ABCC7], https://app.nih-cfde.org/chaise/record/#121/CFDE:gene/nid=5, so it is all there. |
Beta Was this translation helpful? Give feedback.
@mano-at-sdsc :: Okay, try this -- re-download only
HPO_Entrez_gene_IDs_to_EnsEMBL_IDs.tsv
from theexternal_cv_reference_files
subfolder here and re-run the script.