file not found in offline environment #66

yidingdd · 2022-12-06T00:07:03Z

Hi team,

When testing v1.3.0 offline, the pipeline throws an error for both test data and my own data:

No such file: https://gitlab.ebi.ac.uk/nebfield/test-datasets/-/raw/master/pgsc_calc/reference_data/pgsc_calc_ref.sqlar

The pipeline works fine with the internet.
Could you help to solve it?

Best,
Yi

The text was updated successfully, but these errors were encountered:

nebfield · 2022-12-06T15:42:38Z

Hello,

Thanks for reporting the issue!

A workaround is to download a local copy of the reference database on a machine with internet access:

$ wget https://gitlab.ebi.ac.uk/nebfield/test-datasets/-/raw/master/pgsc_calc/reference_data/pgsc_calc_ref.sqlar

Then copy it to a machine without internet access, and manually set the --ref parameter when you run pgsc_calc:

$ nextflow run main.nf -profile test,docker --ref <path/to/pgsc_calc_ref.sqlar>

I hope that fixes your problem. Please let me know if you are still having trouble after trying this.

In future releases we plan to distribute the reference data differently and will make the documentation clearer.

Thanks,
Ben

yidingdd · 2022-12-07T01:16:14Z

Hi Ben,

Thank you for the help!
For my own data, yes, providing the reference file works! Just curious, what's this file used for?
For test data, to make it work, I have to download the sample sheet, plink files and score file as well.

Best,
Yi

nebfield · 2022-12-08T08:48:14Z

Hi Yi,

Sorry, I forgot the test profile also grabs data from the internet. I'll update that to use local repository files 🤔

Currently the reference data is only used in situations when a custom scoring file (not in the PGS Catalog) is set up. If the custom scoring file doesn't match the genome build of the input target genomes, then we use chain files to lift over the scoring file positions.

It's a standard sqlite database:

$ sqlite3 pgsc_calc_ref.sqlar
sqlite> .schema
CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,  -- name of the file
  mode INT,               -- access permissions
  mtime INT,              -- last modification time
  sz INT,                 -- original file size
  data BLOB               -- compressed content
);
sqlite> select name from sqlar;
hg19ToHg38.over.chain.gz
hg38ToHg19.over.chain.gz

In the next release the reference data will get more complex, because we need to add reference populations (e.g. 1000 genomes) to do ancestry inference.

Cheers,
Ben

yidingdd · 2022-12-12T21:22:24Z

Thank you very much for the clarification!

Best,
Yi

smlmbrt assigned nebfield Dec 6, 2022

smlmbrt added the documentation Improvements or additions to documentation label Dec 6, 2022

smlmbrt added the user-query User queries & requests label Dec 14, 2022

nebfield mentioned this issue Feb 10, 2023

Running a new or uncached instance of the pipeline will fail using docker, singularity, and test profiles #84

Closed

nebfield mentioned this issue Jul 13, 2023

Is it possible to run this pipeline in a cluster without internet? #120

Closed

nebfield added this to the v2.0.0 milestone Jul 14, 2023

nebfield linked a pull request Jul 17, 2023 that will close this issue

Update docs for ancestry release #105

Merged

nebfield closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file not found in offline environment #66

file not found in offline environment #66

yidingdd commented Dec 6, 2022

nebfield commented Dec 6, 2022 •

edited

Loading

yidingdd commented Dec 7, 2022

nebfield commented Dec 8, 2022 •

edited

Loading

yidingdd commented Dec 12, 2022

file not found in offline environment #66

file not found in offline environment #66

Comments

yidingdd commented Dec 6, 2022

nebfield commented Dec 6, 2022 • edited Loading

yidingdd commented Dec 7, 2022

nebfield commented Dec 8, 2022 • edited Loading

yidingdd commented Dec 12, 2022

nebfield commented Dec 6, 2022 •

edited

Loading

nebfield commented Dec 8, 2022 •

edited

Loading