gf-dcc · cconrad8 · Apr 4, 2024 · Apr 4, 2024
diff --git a/docs/onboarding/annotate.md b/docs/onboarding/annotate.md
@@ -1,10 +1,9 @@
 ---
-sidebar_position: 6
+sidebar_position: 5
 ---
 
 # Applying Metadata to Data
 
-
 After uploading data, contributors can find those files on our [Data Curator App](https://dca.app.sagebionetworks.org/), which is used to help annotate the data. Select **Gray Foundation**, your project name, and the folder of files you want to annotate. Then select a metadata template.
 Click download template and go through the steps of populating the template. Export the template as a .csv and reupload it to the data curator app and validate. After it passes validation, you may submit it and your annotations will now appear on Synapse with your data. 
 

diff --git a/docs/onboarding/data-analysis.md b/docs/onboarding/data-analysis.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 7
+sidebar_position: 6
 ---
 
 # Data Analysis

diff --git a/docs/onboarding/data-processing.md b/docs/onboarding/data-processing.md
diff --git a/docs/onboarding/upload-data.md b/docs/onboarding/upload-data.md
diff --git a/docs/onboarding/upload-data.mdx b/docs/onboarding/upload-data.mdx
@@ -0,0 +1,190 @@
+---
+sidebar_position: 4
+---
+
+# Clinical metadata and file upload
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Creating a Clinical Metadata Template for BRCA Pre-Cancer Research
+
+<div className="unique-tabs">
+    <Tabs>
+        <TabItem value="Uploading Clinical Metadata">
+This guide provides step-by-step instructions on how to import clinical metadata for the Gray Foundation portal. It explains how to navigate to the Data Curator App, select the project, complete a patient cohort data template, validate the metadata, and submit it. Following this guide will help users efficiently import clinical metadata for the Gray Foundation portal.
+
+1\. Navigate to the [Data Curator App](https://dca.app.sagebionetworks.org/)
+
+
+2\. Select "Gray Foundation"
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/16025e0b-bb41-4c23-bb28-ac137ec114f3/ascreenshot.jpeg?tl_px=0,210&br_px=1290,931&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=437,277)
+
+
+3\. Select your project
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ca8b7410-c180-4a81-a001-d2c87ccea513/ascreenshot.jpeg?tl_px=0,135&br_px=1290,856&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=442,277)
+
+
+4\. Click the "Patient Cohort Data" folder
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/6e332fcc-fad3-443f-89fa-e96d120a4689/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=455,200)
+
+
+5\. Click Patient Cohort Data Template and "Download template"
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/f84e0f3b-b9dc-4ecb-8be6-67ea9fd3d43a/ascreenshot.jpeg?tl_px=0,114&br_px=1290,835&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=437,277)
+
+
+6\. Click the link to the google sheet
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ec76d303-b8cd-45be-9c9f-301dcf8451eb/user_cropped_screenshot.jpeg?tl_px=107,0&br_px=1397,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=524,239)
+
+
+7\. Complete the google sheet with your patient cohort data
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/fc8c6aa3-cd8e-4692-8f9c-3f7609026dfe/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=61,214)
+
+
+8\. Ensure all columns required are complete
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/09acda45-dce6-4061-a977-56828a57257e/user_cropped_screenshot.jpeg?tl_px=47,58&br_px=1337,780&force_format=png&width=1120.0)
+
+
+9\. Download as a .csv file
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/e0f95cb5-71da-420e-991d-e4c14a8bb663/ascreenshot.jpeg?tl_px=0,251&br_px=1290,972&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=488,277)
+
+
+10\. Click "Validate & Submit Metadata"
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/e07ec809-24ee-44ea-b10b-c2518cbfd914/user_cropped_screenshot.jpeg?tl_px=0,161&br_px=1219,883&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=145,354)
+
+
+11\. Click here
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/db97bc63-a50c-4968-9b7d-30863b53466a/user_cropped_screenshot.jpeg?tl_px=0,0&br_px=1290,721&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=375,259)
+
+
+12\. Click "Validate Metadata"
+
+![](https://ajeuwbhvhr.cloudimg.io/colony-recorder.s3.amazonaws.com/files/2024-04-04/ba061693-c8a0-40fc-a450-a61cd3a01123/user_cropped_screenshot.jpeg?tl_px=0,145&br_px=1290,866&force_format=png&width=1120.0&wat=1&wat_opacity=0.7&wat_gravity=northwest&wat_url=https://colony-recorder.s3.us-west-1.amazonaws.com/images/watermarks/FB923C_standard.png&wat_pad=470,489)
+#### [Made with Scribe](https://scribehow.com/shared/Importing_clinical_metadata_for_Gray_Foundation_portal___lDabdd0SsqJ8LAZYhYGNQ)
+        </TabItem>
+        <TabItem value="Uploading Data Files">
+
+Before you begin, identify the destination for your data. Most data are organized in pre-assigned folders based on assay and data levels. You can create subfolders for additional organization, especially for batch-specific data.
+
+#### Data Organization
+
+Data is organized by its assay type and, logically, its processed type in folders.
+Each top-level folder and all of its subfolders must contain data of the same type (see details below).
+
+The DCC will create empty, common top-level folders as well as subfolders for the expected levels of data.
+This depends on whether both raw or processed data are expected.
+Sometimes only raw data or only processed data is expected.
+If only one level of data is expected, everything is "collapsed" into only one folder and there are no subfolders.
+Subfolders must be of the same data type and level as the root folder they are contained.
+
+```plaintext
+└── single_cell_RNA_seq
+    ├── single_cell_RNA_seq_level1
+        ├── fileA.fastq
+        ├── fileB.fastq
+        ├── fileC.fastq
+        └── fileD.fastq
+    ├── single_cell_RNA_seq_level2
+        ├── fileA.bam
+        ├── fileB.bam
+        ├── fileC.bam
+        └── fileD.bam
+    ├── single_cell_RNA_seq_level3
+        ├── raw_counts.txt
+        ├── normalized_counts.txt
+    ├── single_cell_RNA_seq_level4
+        ├── t-SNE.txt
+```
+
+By understanding the data generation process, the Data Coordination Center (DCC) can effectively collaborate with each team to address the following questions:
+
+- What are the different types of data that will be generated, and how can the data artifacts from this workflow be optimally handled and managed?
+- Are there any recommendations that can be provided to ensure a smooth workflow and avoid potential issues in the downstream analysis?
+- What additional resources can the DCC offer, if available?
+
+
+Depending on your funded aims, project teams may have specialized data workflows, which can include:
+
+- Generating sequencing data and deriving data using multiple variant calling pipelines.
+- Producing high-resolution images and extracting summary features from images.
+- Combining different types of data.
+
+During the onboarding process, it is essential to discuss the anticipated workflow, especially if it is complex or deviates from the standard. Project teams should provide information or documentation regarding their workflow.
+
+In addition to the project team's data processing, the DCC also performs data processing on the uploaded data in Synapse. This processing includes:
+
+- Quality control assessments.
+- File format conversions.
+- Other necessary data transformations to facilitate data loading and sharing in cBioPortal or other analysis applications.
+
+## Synapse User Interface (UI)
+
+The UI is suitable for smaller files less than 100MB. In the designated folder, access the Folder Tools menu for upload options. Refer to the general UI documentation for details.
+
+## Programmatic clients
+
+For larger and numerous files, use programmatic clients for efficient uploading. Options include the command-line tool, Python script, or R. Find detailed documentation for each option. Reach out to the DCC for assistance.
+
+#### Typical Workflow with Python Command-Line Client
+
+1. **Install Python Package**: Install the Synapse Python package from [PyPI](https://pypi.org/project/synapseclient/). This will also automatically install the command line utility. Test out that the command line utility is working by typing `synapse help` and feel free to review [docs](https://python-docs.synapse.org/build/html/index.html) for the Python CLI client.
+
+2. **Create Access Token**: For large uploads, it is best to create an access token. Go to your Account profile > Account Settings > Personal Access Tokens > Create new token.
+
+3. **Create Configuration File**: For convenience, copy and paste the token into a `.synapseConfig` text file:
+
+    ```plaintext
+    [authentication]
+    authtoken = sometokenstringxxxxxxxxxxxxxxxxxx
+    ```
+
+4. **Create Manifest File**: Create a list of files to transfer (called a manifest). The parent-id is the Synapse folder you are trying to upload files to:
+
+    ```bash
+    synapse manifest --parent-id syn12345678 --manifest-file manifest.txt PATH_TO_DIR_WITH_FILES
+    ```
+
+5. **Certified User Check**: If you are not a Certified User, the tool will output a message. Review and complete the Certified User portion of Account Setup before proceeding.
+
+6. **Execute Sync Command**: Successful execution should create a manifest file `manifest.txt`. Ensure that `.synapseConfig` is present locally to provide authentication:
+
+    ```bash
+    synapse sync manifest.txt
+    ```
+
+    Options for retries in case of a poor connection:
+
+    ```bash
+    synapse sync --retries 3 manifest.txt
+    ```
+
+#### One-off Uploads
+
+For just a few files, a more convenient command might be:
+
+```bash
+synapse store my_image.tiff --parentId syn12345678
+```
+
+#### Alternative Methods
+
+Under rare unique circumstances, the DCC can explore the following options:
+
+- Receiving data via physical hard drive
+- Utilizing Globus transfers (if really needed)
+- Transferring from a custom S3 bucket
+
+Feel free to adjust or customize it according to your needs!
+        </TabItem>
+    </Tabs>
+</div>