For the BFX Workshop, we will not be using AWS Cloud. Instead, we will use a Docker image created from the AWS AMI used in rnabio.org.
A Docker image is available through the DockerHub repository- https://hub.docker.com/layers/griffithlab/rnabio/0.0.1/images/sha256-b13f5e9048941c8be3e83555295c0f4ed21645d5fd9bae4226e6bc4f30f54b52?context=explore
-
Ensure that Docker Desktop is running.
-
This command will pull the image
rnabio
to your local Docker client with the tag0.0.1
from thegriffithlab
DockerHub repository:
docker pull griffithlab/rnabio:0.0.1
- Setup a local workspace directory for the RNAseq course. If you change the path or command used in this Step, please update the path to the workspace directory accordingly in Step 4. Also, make a file
test_my_docker_mount
that we will look for later.
mkdir -p bfx-workshop/rnabio-workspace
echo 'this file helps me test my docker mount' >> bfx-workshop/rnabio-workspace/test_my_docker_mount.txt
- Enter the directory where you created the
rnabio-workspace
folder, and initialize a Docker container using the image we pulled above.-v
tells Docker to mount our workspace directory within the Docker container as/workspace
with read-write priveleges. You'll see in the RNAseq course/workspace
is the base directory for nearly all commands and steps.
cd bfx-workshop/rnabio-workspace
docker run -v $PWD/:/workspace:rw -it griffithlab/rnabio:0.0.1 /bin/bash
- Use
ls
to see what's in your current directory, then enter theworkspace
folder and usels
again to see what is in theworkspace
folder.
ls
cd workspace
ls
Now that we are running a Docker container, Docker, by default, will log you in as the "root" user. We need to run as the ubuntu user to match the RNAseq course tutorials.
- Switch User
su
to the ubuntu user:
su ubuntu
- Source the pre-installed
.bashrc
file to configure your environment to match the RNAseq course:
source ~/.bashrc
NOTE: Using Docker and the persistent "workspace" volume we attached will allow you to start/stop as you wish. EVERY TIME YOU LOGIN TO THE DOCKER CONTAINER, YOU MUST LOGIN AS THE ubuntu
USER AND source ~/.bashrc
UPON EACH LOGIN.
Create a working directory and set the ‘RNA_HOME’ environment variable
mkdir -p ~/workspace/rnaseq/
export RNA_HOME=~/workspace/rnaseq
Make sure whatever the working dir is, that it is set and is valid
echo $RNA_HOME
Since all the environment variables we set up for the RNA-seq workshop start with ‘RNA’ we can easily view them all by combined use of the env and grep commands as shown below. The env command shows all environment variables currently defined and the grep command identifies string matches.
env | grep RNA
In order to view the contents of this file, you can type:
less ~/.bashrc
To exit less
, type q
.
- When running the check strandedness tool in the Module 1, RNAseq Data section, the docker run command cannot be run from within your
griffithlab/rnabio:0.0.1
docker session. To run it, we suggest that you open a new terminal window,cd
into thernaseq
directory you created at the beginning of this assignment, and use the following command instead:
docker run -v $PWD/:/docker_workspace mgibio/checkstrandedness:latest check_strandedness --gtf /docker_workspace/refs/chr22_with_ERCC92_tidy.gtf --transcripts /docker_workspace/refs/chr22_ERCC92_transcripts.clean.fa --reads_1 /docker_workspace/data/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq.gz --reads_2 /docker_workspace/data/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq.gz
This is the same command as what is mentioned in the course webpage, except that instead of mounting (-v
flag) /home/ubuntu/workspace/rnaseq
to the docker image- which is where the data was stored for students running through the course on an AMI, you will instead mount whatever your current directory is. Also, this is different from an interactive session where we are able to enter the docker and run commands within it. Instead we are executing our command directly all in that one line of code.
-
In various parts of RNAbio, in order to view HTML files, plots, etc., the tutorial suggests going to a public IPV4 address link in your browser window. That is only needed for the AMI. Since you'll be running everything locally, you can either find the files in your Finder window or File Explorer and open them directly; or even better, use
open [your_file.html]
on Mac andexplorer.exe [your_file.html]
on Windows/WSL2 to open the file in your default browser! -
In Pre-alignment QC, an optional QC analysis is running fastp. This software is not available in your docker, so please skip it (the fastqc and multiqc analysis should still work and can be used for analysis). Similarly, you can also skip the adapter trim step as the data provided here does not actually need to be adapter trimmed (however the code is available if you need to do it for your own data)
-
geneBody_coverage.py
in the optional RSeQC section is not correctly in thePATH
. Use the full path to the python script/home/ubuntu/.local/bin/geneBody_coverage.py
For-credit students: please count the number of lines in the merged UHR.bam and HBR.bam files and send to Jenny along with an IGV screenshot with the UHR and HBR merged BAM files at the following location on chromosome 22: chr22:40,363,200-40,367,500
.