Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scribe #84

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/onboarding/annotate.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
---
sidebar_position: 6
sidebar_position: 5
---

# Applying Metadata to Data


After uploading data, contributors can find those files on our [Data Curator App](https://dca.app.sagebionetworks.org/), which is used to help annotate the data. Select **Gray Foundation**, your project name, and the folder of files you want to annotate. Then select a metadata template.
Click download template and go through the steps of populating the template. Export the template as a .csv and reupload it to the data curator app and validate. After it passes validation, you may submit it and your annotations will now appear on Synapse with your data.

Expand Down
2 changes: 1 addition & 1 deletion docs/onboarding/data-analysis.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 7
sidebar_position: 6
---

# Data Analysis
Expand Down
58 changes: 0 additions & 58 deletions docs/onboarding/data-processing.md

This file was deleted.

66 changes: 0 additions & 66 deletions docs/onboarding/upload-data.md

This file was deleted.

190 changes: 190 additions & 0 deletions docs/onboarding/upload-data.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
sidebar_position: 4
---

# Clinical metadata and file upload

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Creating a Clinical Metadata Template for BRCA Pre-Cancer Research

<div className="unique-tabs">
<Tabs>
<TabItem value="Uploading Clinical Metadata">
This guide provides step-by-step instructions on how to import clinical metadata for the Gray Foundation portal. It explains how to navigate to the Data Curator App, select the project, complete a patient cohort data template, validate the metadata, and submit it. Following this guide will help users efficiently import clinical metadata for the Gray Foundation portal.

1\. Navigate to the [Data Curator App](https://dca.app.sagebionetworks.org/)


2\. Select "Gray Foundation"

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/16025e0b-bb41-4c23-bb28-ac137ec114f3/ascreenshot.jpeg?tl_px=0,210&br_px=1290,931&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=437,277)


3\. Select your project

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ca8b7410-c180-4a81-a001-d2c87ccea513/ascreenshot.jpeg?tl_px=0,135&br_px=1290,856&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=442,277)


4\. Click the "Patient Cohort Data" folder

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/6e332fcc-fad3-443f-89fa-e96d120a4689/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=455,200)


5\. Click Patient Cohort Data Template and "Download template"

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/f84e0f3b-b9dc-4ecb-8be6-67ea9fd3d43a/ascreenshot.jpeg?tl_px=0,114&br_px=1290,835&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=437,277)


6\. Click the link to the google sheet

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ec76d303-b8cd-45be-9c9f-301dcf8451eb/user_cropped_screenshot.jpeg?tl_px=107,0&br_px=1397,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=524,239)


7\. Complete the google sheet with your patient cohort data

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/fc8c6aa3-cd8e-4692-8f9c-3f7609026dfe/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=61,214)


8\. Ensure all columns required are complete

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/09acda45-dce6-4061-a977-56828a57257e/user_cropped_screenshot.jpeg?tl_px=47,58&br_px=1337,780&force_format=png&width=1120.0)


9\. Download as a .csv file

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/e0f95cb5-71da-420e-991d-e4c14a8bb663/ascreenshot.jpeg?tl_px=0,251&br_px=1290,972&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=488,277)


10\. Click "Validate & Submit Metadata"

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/e07ec809-24ee-44ea-b10b-c2518cbfd914/user_cropped_screenshot.jpeg?tl_px=0,161&br_px=1219,883&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=145,354)


11\. Click here

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/db97bc63-a50c-4968-9b7d-30863b53466a/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=375,259)


12\. Click "Validate Metadata"

![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ba061693-c8a0-40fc-a450-a61cd3a01123/user_cropped_screenshot.jpeg?tl_px=0,145&br_px=1290,866&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=470,489)
#### [Made with Scribe](https://scribehow.com/shared/Importing_clinical_metadata_for_Gray_Foundation_portal___lDabdd0SsqJ8LAZYhYGNQ)
</TabItem>
<TabItem value="Uploading Data Files">

Before you begin, identify the destination for your data. Most data are organized in pre-assigned folders based on assay and data levels. You can create subfolders for additional organization, especially for batch-specific data.

#### Data Organization

Data is organized by its assay type and, logically, its processed type in folders.
Each top-level folder and all of its subfolders must contain data of the same type (see details below).

The DCC will create empty, common top-level folders as well as subfolders for the expected levels of data.
This depends on whether both raw or processed data are expected.
Sometimes only raw data or only processed data is expected.
If only one level of data is expected, everything is "collapsed" into only one folder and there are no subfolders.
Subfolders must be of the same data type and level as the root folder they are contained.

```plaintext
└── single_cell_RNA_seq
├── single_cell_RNA_seq_level1
├── fileA.fastq
├── fileB.fastq
├── fileC.fastq
└── fileD.fastq
├── single_cell_RNA_seq_level2
├── fileA.bam
├── fileB.bam
├── fileC.bam
└── fileD.bam
├── single_cell_RNA_seq_level3
├── raw_counts.txt
├── normalized_counts.txt
├── single_cell_RNA_seq_level4
├── t-SNE.txt
```

By understanding the data generation process, the Data Coordination Center (DCC) can effectively collaborate with each team to address the following questions:

- What are the different types of data that will be generated, and how can the data artifacts from this workflow be optimally handled and managed?
- Are there any recommendations that can be provided to ensure a smooth workflow and avoid potential issues in the downstream analysis?
- What additional resources can the DCC offer, if available?


Depending on your funded aims, project teams may have specialized data workflows, which can include:

- Generating sequencing data and deriving data using multiple variant calling pipelines.
- Producing high-resolution images and extracting summary features from images.
- Combining different types of data.

During the onboarding process, it is essential to discuss the anticipated workflow, especially if it is complex or deviates from the standard. Project teams should provide information or documentation regarding their workflow.

In addition to the project team's data processing, the DCC also performs data processing on the uploaded data in Synapse. This processing includes:

- Quality control assessments.
- File format conversions.
- Other necessary data transformations to facilitate data loading and sharing in cBioPortal or other analysis applications.

## Synapse User Interface (UI)

The UI is suitable for smaller files less than 100MB. In the designated folder, access the Folder Tools menu for upload options. Refer to the general UI documentation for details.

## Programmatic clients

For larger and numerous files, use programmatic clients for efficient uploading. Options include the command-line tool, Python script, or R. Find detailed documentation for each option. Reach out to the DCC for assistance.

#### Typical Workflow with Python Command-Line Client

1. **Install Python Package**: Install the Synapse Python package from [PyPI](https://pypi.org/project/synapseclient/). This will also automatically install the command line utility. Test out that the command line utility is working by typing `synapse help` and feel free to review [docs](https://python-docs.synapse.org/build/html/index.html) for the Python CLI client.

2. **Create Access Token**: For large uploads, it is best to create an access token. Go to your Account profile > Account Settings > Personal Access Tokens > Create new token.

3. **Create Configuration File**: For convenience, copy and paste the token into a `.synapseConfig` text file:

```plaintext
[authentication]
authtoken = sometokenstringxxxxxxxxxxxxxxxxxx
```

4. **Create Manifest File**: Create a list of files to transfer (called a manifest). The parent-id is the Synapse folder you are trying to upload files to:

```bash
synapse manifest --parent-id syn12345678 --manifest-file manifest.txt PATH_TO_DIR_WITH_FILES
```

5. **Certified User Check**: If you are not a Certified User, the tool will output a message. Review and complete the Certified User portion of Account Setup before proceeding.

6. **Execute Sync Command**: Successful execution should create a manifest file `manifest.txt`. Ensure that `.synapseConfig` is present locally to provide authentication:

```bash
synapse sync manifest.txt
```

Options for retries in case of a poor connection:

```bash
synapse sync --retries 3 manifest.txt
```

#### One-off Uploads

For just a few files, a more convenient command might be:

```bash
synapse store my_image.tiff --parentId syn12345678
```

#### Alternative Methods

Under rare unique circumstances, the DCC can explore the following options:

- Receiving data via physical hard drive
- Utilizing Globus transfers (if really needed)
- Transferring from a custom S3 bucket

Feel free to adjust or customize it according to your needs!
</TabItem>
</Tabs>
</div>