Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make variable naming consistent with ISA standard #13

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sellth
Copy link
Collaborator

@sellth sellth commented Jun 7, 2023

Copy link

@Nicolai-vKuegelgen Nicolai-vKuegelgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with harmonising this overal, I'm not sure that a blanket change to all variable names is the right approach here:

  1. for one thing, the stem cell core templates actually ask for sample names, since we use the cellline names as source
  2. From what I've seen so far many templates use the same name for sample & source. While this is of course not ideal/desribale, it is often not quite avoidable without a specific experimental design (that the cookiecutter templates can't really build upon since the have to be very general). Therefore the cookiecutter json may conceptually only ask for a single 'name' for a sample and then use that for both source and sample name. For most experimental people (or just people not accustomed to ISA) sample name is much more intuitive description than source name in these cases.

@Nicolai-vKuegelgen
Copy link

Another thing (I almost forgot):

This only seems to the address the variable names, however many templates also use Sample Name as the first column in the s-file instead of Source Name, so maybe this should/needs to be changed as well? Otherwise the variable renaming makes little sense.
If we do change this I'm not sure what the effects on pipelines that might depend in these templates will be.

@sellth
Copy link
Collaborator Author

sellth commented Jun 15, 2023

Thanks Nicolai for looking into this.

  1. for one thing, the stem cell core templates actually ask for sample names, since we use the cellline names as source

That was indeed an oversight in the stem_cell_core_sc template which is fixed now. I left _bulk unchanged because of this.

  1. From what I've seen so far many templates use the same name for sample & source. […] For most experimental people (or just people not accustomed to ISA) sample name is much more intuitive description than source name in these cases.

Most templates derive their Source Names from the Sample Names, but I would agree with Mikko that this is a bit confusing in the context of ISA-tabs and also experimentally. I would expect Sample Names to be derived from the Source Names plus a suffix (optionally). That is how I defined it in for the MC template, there is source_names and sample_suffix in the cookiecutter.json.

This only seems to the address the variable names, however many templates also use Sample Name as the first column in the s-file instead of Source Name, so maybe this should/needs to be changed as well? Otherwise the variable renaming makes little sense.

Not sure what you mean by this. s_ files need to start with a Source Name column to be standard compliant and all do so right now.

@Nicolai-vKuegelgen
Copy link

Most templates derive their Source Names from the Sample Names, but I would agree with Mikko that this is a bit confusing in the context of ISA-tabs and also experimentally. I would expect Sample Names to be derived from the Source Names plus a suffix (optionally). That is how I defined it in for the MC template, there is source_names and sample_suffix in the cookiecutter.json.

The templates might do indeed do this, but I would argue that most users generally do not, since they only come up with source names when they start entering things into sodar (they will always have some sort of sample name ready).
Maybe the more important questions to answer for is: who will use these templates or rather who do we want to use them?
For larger projects (with inevitably closer cubi collaboration), someone will probably figure out a good way to organise and derive sample and source names. But smaller projects that - maybe one day? - some can just create & fill the samplehseet from within sodar this is not the case, and these people likely will come with a list of samples and names, but not source names.

Not sure what you mean by this. s_ files need to start with a Source Name column to be standard compliant and all do so right now.

Ah you're right I must have confused some (older?) things here or maybe I just remebered the start of some a-files ...

Copy link

@Nicolai-vKuegelgen Nicolai-vKuegelgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise the changes all look good to me.

I have some concerns about the idea/purpose of this change (see my other comment), but that's not easily addressed either way.

@sellth
Copy link
Collaborator Author

sellth commented Jun 20, 2023

As this is not really urgent, let's not do anything hastily and talk once I'm back in Berlin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mislabeled "sample names" in ISA-Tab templates
2 participants