dealing with repeated arms sequencing runs #23

kmexter · 2024-09-27T14:24:09Z

The ARMS samples processed by genoscope will be ingested into GH (as soon as the logsheets are created and filled and harvested).
For some ARMS samples, the sequencing was done twice on the same sample (as far as I can tell it was actually the same sample, not a replicate sample).
You can see this in the ARMS overview googlesheet https://docs.google.com/spreadsheets/d/1j3yuY5lmoPMo91w6e3kkJ6pmp1X6FVGUtLealuKJ3wE/edit?gid=855411053#gid=855411053 rows 547+587 or 539+584 are the same sample (they have the same sample ID and same SampReplicate in col P) but a first and a second sequencing run (see column N). The fact that in col 1 these are indicated as being "genoscope batch xx" means that these were processed as part of emo bon, and so these events and samples should be added to the emo bon ARMS logsheets and get into the emobon GH.

So I have 2 questions

were these really sequences on the same sample replicate?
if yes, then we need to find a way to indicate that one replicate was used to create 2 sequences. That mean, we have to associated one sample with 2 sequences. my suggestion is to NOT make any changes to the mat_samp_id (which is what we did for the arms-mbon project and was a shitty decision, but it is a fait accomplie now) but to deal with it in the ttl files by allowing a one-to-many relationship
if no, can someone tell me what sample replicate number these should be then?

Adding @cymon and @cpavloud as this affects you/you are decision makers on this topic

cpavloud · 2024-09-27T14:54:11Z

FYI, sequencing has been done twice for other samples too, not only for ARMS.

Allowing one-to-many relationships probably is the way to go. This is also what ENA does (one sample accession number can be linked to multiple run accession numbers).

By the way, line 547 has MaterialSample-ID "ARMS_Svalbard_S1C_20200806_20220803_MF500_DMSO" while line 587 has a different one "ARMS_Svalbard_S1C_20200806_20220803_MF500"

kmexter · 2024-10-01T09:43:21Z

ok, for the the ARMS-only (i.e. ARMS-MBON GH space) we deal with this by adding _r1 to the ARMS material sample ID etc to the repeated sequencing runs, and _s1 etc for sample replicates .... yes I know, shitty convention but it is what it is now :-| .

But for the emo-bon arms (and other) data we need to deal with this. We can have a one to many, that is fine. Questions

How should we indicate this in the governance files for the omics data? This would be the files in e.g. https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001. In the file https://github.com/emo-bon/sequencing-data/blob/main/shipment/batch-001/ena-accession-numbers-batch-001.csv in there we have currently only the project, sample, biosample, and umbrella ENA accession numbers, and the Genoscope ref_code, but the idea was to add the run accession numbers to this also? But then how do we deal with there being 2 run accessions for some samples, as well as saying which is the first and which is the second?
So perhaps we need a separate file for run accession numbers? in there we can have e.g. source_material_id, ref_code, run_accession_number, comment and where there is more than one run_accession number, we repeat a row but have a different run_accession_number and comment?

kmexter · 2024-10-01T09:44:20Z

FYI, sequencing has been done twice for other samples too, not only for ARMS.

Allowing one-to-many relationships probably is the way to go. This is also what ENA does (one sample accession number can be linked to multiple run accession numbers).

By the way, line 547 has MaterialSample-ID "ARMS_Svalbard_S1C_20200806_20220803_MF500_DMSO" while line 587 has a different one "ARMS_Svalbard_S1C_20200806_20220803_MF500"

That should not be! We agreed to use DMSO or ETOH for all samples in year 1 only, so DMSO has to go.
I will go over this with @JustinePa, we will decide whether we need to remove this "DMSO" and from where it has to be removed

kmexter · 2024-10-22T08:44:00Z

and @laurianvm and I will go over this when we add the ARMS to the data model

kmexter assigned cymon, cpavloud and kmexter Sep 27, 2024

kmexter assigned laurianvm and JustinePa Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dealing with repeated arms sequencing runs #23

dealing with repeated arms sequencing runs #23

kmexter commented Sep 27, 2024 •

edited

Loading

cpavloud commented Sep 27, 2024

kmexter commented Oct 1, 2024

kmexter commented Oct 1, 2024 •

edited

Loading

kmexter commented Oct 22, 2024

dealing with repeated arms sequencing runs #23

dealing with repeated arms sequencing runs #23

Comments

kmexter commented Sep 27, 2024 • edited Loading

cpavloud commented Sep 27, 2024

kmexter commented Oct 1, 2024

kmexter commented Oct 1, 2024 • edited Loading

kmexter commented Oct 22, 2024

kmexter commented Sep 27, 2024 •

edited

Loading

kmexter commented Oct 1, 2024 •

edited

Loading