Biostat578 Midterm Exam

Name:

Student Number:

This is an open book/computer exam. Feel free to use whatever resources you need, but work on your own. There are 11 questions for a total of 100 points. You have until 10:20am to complete this exam. Good luck!

[10 points] X and Y are two datatables in R, how would do an inner join of X and Y. Provide 2 possible solutions and explain what the difference is between the two.

[10 points] Let dt be a datatable in R with three columns: subject_id, month, day, and weight. weight is a positive number. Write a one line command to compute the average weight for each subject broken down by month, and sort by weight from lowest to highest.

[10 points] Let E be an ExpressionSet in R. How would you retrieve the expression matrix, the probe information and the sample information? [State the name of the methods (i.e. R function) you would use]. What if you wanted to replace these values?

[5 points] What is the main idea behind quantile normalization?

[5 points] What is the main idea behind TMM normalization?

[10 points] You've performed a gene expression experiment using microarrays. Unfortunately, some of your samples were processed in June, while others were processed in October. You suspect a batch effect and would like to correct for it in your limma analysis. How would you do that? You may assume that your contrast of interest is not confounded with batch order

[5 points] What is the false discovery rate? Why is it preferable to control this versus the family-wise error rate?

[15 points] You have a gene expression experiment looking at expression changes in a cohort of 40 subjects before and after vaccination. You follow the subjects over time and have samples at day 0 (day of vaccination), day 7 and day 14. How would do set up your design matrix in limma (write the actual R command), what contrasts would you test?

[10 points] What is the main idea behind limma? Why is it preferable over a traditional linear model?

[5 points] limma was derived for gene microarrays. It's been shown recently that it could also be applied to next generation sequencing data after proper data transformation using voom. What is voom actually doing?

[15 points] In gene expression quantification using RNA-seq data, how can you handle reads that map multiple locations (multi-reads)? Descrine at least two methods and discuss the pros and cons of both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Midterm_new.md

Midterm_new.md

Biostat578 Midterm Exam

Name:

Student Number:

Files

Midterm_new.md

Latest commit

History

Midterm_new.md

File metadata and controls

Biostat578 Midterm Exam

Name:

Student Number: