Replies: 9 comments 12 replies
-
Examples for types of results files that would need to be described with a generalized "data dictionary"
@muehlhaus @Brilator adding examples for TDF and MAF files |
Beta Was this translation helpful? Give feedback.
-
The first idea that came up in DataPLANT is to extend the ISA “data file” object with a kind of data dictionary including the following fields:
We envision an additional file (table) called “ISA dataset” that accompanies one ore more result files (e.g. data matrices). Your thoughts are highly appreciated. |
Beta Was this translation helpful? Give feedback.
-
@muehlhaus ,is this what you had in mind (please see the second example where 2 files are referenced ( the MIAPPA TDF and the Metabolights MAF). this brings up the issue of complex header structure to take into account 2 dimensions: |
Beta Was this translation helpful? Give feedback.
-
Hello. My (computer scientist) pov. The following is more or less how I (and a few others) have envisioned the "data dictionary" (not necessarily complying with ISA, actually kind of orthogonal to it).
The following (imagined as an additional sheet in "isa.dataset.xlsx") is less of an extension but more a reuse of ISA to document which physical samples contributed (via some software) to which data files. (The column names are rather ad-hoc, excuse me, I am not much of an ISA expert yet.)
Lastly, the following pic demonstrates (roughly …) my intended use case: being able to grasp individual "variables" inside data files. |
Beta Was this translation helpful? Give feedback.
-
Oh, to clarify that empty column. There is the concept of "derived" variables, that – in my mock-up – go in different sheets depending on their "level of dependence". E.g.
and L2-Derivations
|
Beta Was this translation helpful? Give feedback.
-
Nice work. Let me look it up.
…On Mon, Jan 9, 2023, 17:56 Christopher Kappe ***@***.***> wrote:
Oh, to clarify that empty column. There is the concept of "derived"
variables, that – in my mock-up – go in different sheets depending on their
"level of dependence". E.g.
L1-Derivations
Data File Path Source Name Software Data File Path 2 Sample Name
runs/kallisto_sleuth/sleuth_dge.csv mean_obs sleuth
runs/kallisto_sleuth/sleuth_dge.csv smooth_sigma_sq
runs/kallisto_sleuth/sleuth_dge.csv sigma_sq sleuth
runs/kallisto_sleuth/sleuth_dge.csv smooth_sigma_sq
and L2-Derivations
Data File Path Source Name Software Data File Path 2 Sample Name
runs/kallisto_sleuth/sleuth_dge.csv sigma_sq sleuth
runs/kallisto_sleuth/sleuth_dge.csv rss
runs/kallisto_sleuth/sleuth_dge.csv smooth_sigma_sq sleuth
runs/kallisto_sleuth/sleuth_dge.csv rss
runs/kallisto_sleuth/sleuth_dge.csv sigma_sq sleuth
runs/kallisto_sleuth/sleuth_dge.csv final_sigma_sq
runs/kallisto_sleuth/sleuth_dge.csv smooth_sigma_sq sleuth
runs/kallisto_sleuth/sleuth_dge.csv final_sigma_sq
—
Reply to this email directly, view it on GitHub
<#484 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3K3NVVPCC3QWGCXOPKEXY3WRQ7MZANCNFSM6AAAAAAS6WHRJQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hey, I also wanted to give my input from the perspective of a tool developer, consuming both Process graph pointersFirstly I want to touch on the pointers in the process graph, which can then be used to associate Naming of the headers / Integration into ISA-TabAs was stated above, these So these header pairs (
Having Integration into ISA-JsonThis is relatively straight forward I think. The data object currently contains the field {
"filePath": "runs/kallisto_sleuth/sleuth_dge.csv",
"identifier": "mean_obs",
"type": "Raw Data File"
} and {
"filePath": "runs/kallisto_sleuth/sleuth_dge.csv",
"identifier": "smooth_sigma_sq",
"type": "Derived Data File"
} The name "type": {
"type": "string",
"enum": [
"Raw Data File",
"Derived Data File",
"Image File",
"Raw Data",
"Derived Data",
"Image"
]
} or just "type": {
"type": "string",
"enum": [
"Raw Data",
"Derived Data",
"Image"
]
} Pointer descriptors / DatasetAll of you touched the reasons for and the modelling details of these pointer descriptors nicely. I just want to add a few thoughts about the implementation. Integration into ISA-TabI'm not quite sure what your final thoughts about the STUDY
Unfortunately this breaks the DATASET
Input on this would be much appreciated. Integration into ISA-JsonAgain, the integration into {
"filePath": "runs/kallisto_sleuth/sleuth_dge.csv",
"pointer": "mean_obs",
"type": "Raw Data File",
"wasGeneratedBy" : "workflows/kallisto_sleuth.R",
"attribute" : {
"annotationValue" : "Arithmetic Mean",
"termSource" : "NCIT",
"termAccession" : "http://purl.obolibrary.org/obo/NCIT_C53319"
},
"objectType" : "Decimal",
"label" : "Mean"
} |
Beta Was this translation helpful? Give feedback.
-
@HLWeil @kappe-c, thank you the input and explanation. @HLWeil : one clarification please. Should your I am still unclear about the following points:
@terazus what are your thoughts ? |
Beta Was this translation helpful? Give feedback.
-
Hi guys, I just wanted to follow up on this. |
Beta Was this translation helpful? Give feedback.
-
The community requires a way to describe result file content as part of the ISA -Model. Here, we want to discuss possible solutions. (Discussion related to issue #475)
Currently, the ISA model is very strong in describing the path from biological source to a measurement result file. From this point on the model relies on the specification of the result file format for machine tractability, which is perfect if such a file format is established. However, we often face the situation that such a file format is not established, or we want to point into such a format describing a specific processing path.
• Possibility to point into a file (e.g. data frame, XML, JSON, or image coordinate space)
• Using the classical ISA process description (e.g. ISA Tab) to not change the user experience
Beta Was this translation helpful? Give feedback.
All reactions