-
Notifications
You must be signed in to change notification settings - Fork 22
Quick VM Tour
The simplest way to get a quick sense of what the GMS is all about might be to try loading a virtual machine where the GMS has already been installed and configured. When the GMS virtual machine loads you will be logged in as the user genome
(with a password that is also genome
). All installation and configuration steps will be complete and demonstration data will be in place. The system can be immediately tested by running genotype-microarray
, reference-alignment
, somatic-variation
, rna-seq
, differential-expression
, and clin-seq pipelines
on this demonstration data.
The virtual machine is a self contained sandbox. The idea is for you to take a short tour of the GMS, execute some simple GMS commands, view some features in the GMS web-viewer, etc. When you are done, you can remove the Virtual Machine and your system will be completely unaffected by the test.
Please note that the pre-configured GMS is meant for simple demonstration purposes only. If you wish to use the system in earnest for large-scale analysis you will want to identify appropriate hardware and adopt one of the installation methods described in the Install Manual.
Finally, to keep this tutorial simple, many details are left out. These details can be found throughout the GMS manuscript and elsewhere in the GMS wiki. For example, the Installation Guide, the Location and Description of the HCC1395 Data, the FAQ page, the Guide to Importing your own Data, the Reference Manual for useful Genome Commands, the Beginners Guide to Demonstration analysis, and much more...
The virtual machine was created with VirtualBox version 4.3.8. VirtualBox is open-source and freely available for the Mac, Linux, and Windows platforms. Download and install VirtualBox for your system here:
https://www.virtualbox.org/wiki/Downloads
The pre-configured virtual machine image contains the GMS installation, a fully functional Ubuntu 12.04 Precise operating system, annotation databases, reference genome sequences, example data and much more. The pre-configured virtual machines are available here:
https://xfer.genome.wustl.edu/gxfer1/project/gms/vms/
The image files are large (~48 Gb) and will take some time to download. You should therefore use a download agent that will allow the download to resume if it is interrupted. For example, at a terminal you could use wget
.
wget https://xfer.genome.wustl.edu/gxfer1/project/gms/vms/GMS_VM_V1.zip
Use your favorite decompression software to unpack the virtual machine. For example, in a Mac or Linux terminal you could use unzip GMS_VM_V1.zip
. On Mac or Windows you can probably simply double-click the archive file.
Open VirtualBox and add the GMS virtual machine by selecting the GMS .vbox file as follows.
Within VirtualBox, use the Machine -> Add
option:
Find the GMS .vbox
file and open it:
Depending on the resources available on your system you may want to adjust resource usage. For example, you might adjust the base memory, video memory, CPUs, and network connection type. To adjust each of these and more, select the machine GMS_VM_V1
and press the Settings
button at the top left of the VirtualBox interface.
General settings:
Number of processors:
Base memory:
Video memory:
Network (set to NAT
by default by Bridged Adaptor
may be faster):
Select the machine GMS_VM_V1
and press the Start ->
button at the top left of the VirtualBox interface. The machine will boot and you will be automatically logged in as the user genome
. If you are ever prompted for a password, remember that both the username and password for the system are genome
. When the machine boots, you may prompted with some messages about keyboard and mouse settings. You can safely dismiss these.
Logging into the GMS
Step 7. Open the GMS web-viewer and explore demonstration models, processing-profiles, instrument-data, etc.
Open the FireFox browser by clicking the orange and blue icon on the left.
Open a Terminal window by clicking the black icon on the left. Then execute the following commands to test various basic components of the system:
lsid # You should see the openlava cluster identification
lsload # You should see a report of available resources
bjobs # You should not have any unfinished jobs yet
bsub 'sleep 60' # You should be able to submit a job to openlava (run bjobs again to see it)
bhosts # You should see one host
bqueues # You should see four queues
genome disk group list # You should see four disk groups
genome disk volume list # You should see at least one volume for your local drive
genome sys gateway list # You should see two gateways, one for your new home system and one for the test data "GMS1"
# list the metadata that is already present in the database:
genome taxon list
genome individual list
genome sample list
genome library list
genome instrument-data list solexa
# list the pre-defined models (no results yet ... you will launch these and generate results):
genome model list
# view the processing profiles (pipeline descriptions) associated with those models:
genome processing-profile view --processing-profile='Default Reference Alignment'
genome processing-profile view --processing-profile='Default Somatic Variation Exome'
genome processing-profile view --processing-profile='Default Somatic Variation WGS'
genome processing-profile view --processing-profile='Default Ovation V2 RNA-seq'
genome processing-profile view --processing-profile='cuffcompare/cuffdiff 2.0.2 protein_coding only'
Open a Terminal window by clicking the black icon on the left. Then execute the following command to view models that have already been defined in the system for demonstration purposes:
genome model list
Start a the genotype-microarray
builds for tumor and normal as follows:
genome model build start 'hcc1395-normal-snparray'
genome model build start 'hcc1395-tumor-snparray'
You can monitor progress of ongoing analysis runs in several ways. For example, you can load the GMS web-viewer and go to the builds
tab. Or you can view the status of all builds in a Terminal using the command genome model build list
. Or you can view a much more detailed status of a running build using the following command for the build of interest (replacing '$build_id' with your own build ID):
genome model build view '$build_id'
You can find the genotype-microarray
results files as follows:
genome model build list --filter model.name='hcc1395-normal-snparray' --show data_directory
Once the genotype-microarray
builds are done launch the reference-alignment
builds for the exome data as follows (you may want to do one at a time if you are running on a small machine like a laptop):
genome model build start 'hcc1395-normal-refalign-exome'
Once again you can view the progress of this build as follows:
genome model build view '$build_id'
As above you can find the results files for the reference-alignment
pipeline including BAM files and germline variants in VCF format as follows:
genome model build list --filter model.name='hcc1395-normal-refalign-exome' --show model.name,data_directory
genome model build list --filter model.name='hcc1395-tumor-refalign-exome' --show model.name,data_directory
To get the final, merged, sorted, duplicate-marked BAM from the tumor exome alignment, you can use the following method:
genome model list --filter name='hcc1395-tumor-refalign-exome' --show id,name,last_complete_build.merged_alignment_result.bam_path
For many more examples, refer to the Reference Manual for useful Genome Commands.