This is an open book/computer exam. Feel free to use whatever resources you need, but work on your own. There are 11 questions for a total of 100 points. You have until 10:20am to complete this exam. Good luck!
- [10 points]
X
andY
are two datatables inR
, how would do an inner join ofX
andY
. Provide 2 possible solutions and explain what the difference is between the two.
- [10 points] Let
dt
be a datatable inR
with three columns:subject_id
,month
,day
, andweight
.weight
is a positive number. Write a one line command to compute the average weight for each subject broken down bymonth
, and sort by weight from lowest to highest.
- [10 points] Let
E
be anExpressionSet
inR
. How would you retrieve the expression matrix, the probe information and the sample information? [State the name of the methods (i.e. R function) you would use]. What if you wanted to replace these values?
- [5 points] What is the main idea behind quantile normalization?
- [5 points] What is the main idea behind TMM normalization?
- [10 points] You've performed a gene expression experiment using microarrays. Unfortunately, some of your samples were processed in June, while others were processed in October. You suspect a batch effect and would like to correct for it in your limma analysis. How would you do that? You may assume that your contrast of interest is not confounded with batch order
- [5 points] What is the false discovery rate? Why is it preferable to control this versus the family-wise error rate?
- [15 points] You have a gene expression experiment looking at expression changes in a cohort of 40 subjects before and after vaccination. You follow the subjects over time and have samples at day 0 (day of vaccination), day 7 and day 14. How would do set up your design matrix in limma (write the actual R command), what contrasts would you test?
- [10 points] What is the main idea behind
limma
? Why is it preferable over a traditional linear model?
- [5 points]
limma
was derived for gene microarrays. It's been shown recently that it could also be applied to next generation sequencing data after proper data transformation usingvoom
. What isvoom
actually doing?
- [15 points] In gene expression quantification using RNA-seq data, how can you handle reads that map multiple locations (multi-reads)? Descrine at least two methods and discuss the pros and cons of both.