Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file not found in offline environment #66

Closed
yidingdd opened this issue Dec 6, 2022 · 4 comments · Fixed by #105
Closed

file not found in offline environment #66

yidingdd opened this issue Dec 6, 2022 · 4 comments · Fixed by #105
Assignees
Labels
documentation Improvements or additions to documentation user-query User queries & requests
Milestone

Comments

@yidingdd
Copy link

yidingdd commented Dec 6, 2022

Hi team,

When testing v1.3.0 offline, the pipeline throws an error for both test data and my own data:

No such file: https://gitlab.ebi.ac.uk/nebfield/test-datasets/-/raw/master/pgsc_calc/reference_data/pgsc_calc_ref.sqlar

The pipeline works fine with the internet.
Could you help to solve it?

Best,
Yi

@smlmbrt smlmbrt added the documentation Improvements or additions to documentation label Dec 6, 2022
@nebfield
Copy link
Member

nebfield commented Dec 6, 2022

Hello,

Thanks for reporting the issue!

A workaround is to download a local copy of the reference database on a machine with internet access:

$ wget https://gitlab.ebi.ac.uk/nebfield/test-datasets/-/raw/master/pgsc_calc/reference_data/pgsc_calc_ref.sqlar

Then copy it to a machine without internet access, and manually set the --ref parameter when you run pgsc_calc:

$ nextflow run main.nf -profile test,docker --ref <path/to/pgsc_calc_ref.sqlar>

I hope that fixes your problem. Please let me know if you are still having trouble after trying this.

In future releases we plan to distribute the reference data differently and will make the documentation clearer.

Thanks,
Ben

@yidingdd
Copy link
Author

yidingdd commented Dec 7, 2022

Hi Ben,

Thank you for the help!
For my own data, yes, providing the reference file works! Just curious, what's this file used for?
For test data, to make it work, I have to download the sample sheet, plink files and score file as well.

Best,
Yi

@nebfield
Copy link
Member

nebfield commented Dec 8, 2022

Hi Yi,

Sorry, I forgot the test profile also grabs data from the internet. I'll update that to use local repository files 🤔

Currently the reference data is only used in situations when a custom scoring file (not in the PGS Catalog) is set up. If the custom scoring file doesn't match the genome build of the input target genomes, then we use chain files to lift over the scoring file positions.

It's a standard sqlite database:

$ sqlite3 pgsc_calc_ref.sqlar
sqlite> .schema
CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,  -- name of the file
  mode INT,               -- access permissions
  mtime INT,              -- last modification time
  sz INT,                 -- original file size
  data BLOB               -- compressed content
);
sqlite> select name from sqlar;
hg19ToHg38.over.chain.gz
hg38ToHg19.over.chain.gz

In the next release the reference data will get more complex, because we need to add reference populations (e.g. 1000 genomes) to do ancestry inference.

Cheers,
Ben

@yidingdd
Copy link
Author

Thank you very much for the clarification!

Best,
Yi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation user-query User queries & requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants