data moved and example improved

hadexversum · May 22, 2024 · 9403237 · 9403237
1 parent 208ca05
commit 9403237
Show file tree

Hide file tree

Showing 3 changed files with 43 additions and 34 deletions.
diff --git a/data-raw/DATASET.R b/data-raw/DATASET.R
@@ -3,5 +3,8 @@
 gamma_dat <- HaDeX::read_hdx("C:/Users/User/Downloads/GAMMAalpha_cut.csv.csv")
 alpha_dat <- HaDeX::read_hdx("C:/Users/User/Desktop/article_hradex/ALPHA G i BG.csv")
 
+alpha_dat <- HRaDeX::move_dataset(alpha_dat, -25)
+
 usethis::use_data(alpha_dat, overwrite = TRUE)
 usethis::use_data(gamma_dat, overwrite = TRUE)
+
diff --git a/data/alpha_dat.rda b/data/alpha_dat.rda
diff --git a/vignettes/example.Rmd b/vignettes/example.Rmd
@@ -27,11 +27,11 @@ library(dplyr)
 
 # Introduction
 
-This document is an adapted version of Supplement to the HRaDeX manuscript.
+This document is an adapted version of the Supplement to the HRaDeX manuscript.
 
-Here we describe the detailed step-by-step analysis of experimental data using the hadexversum family tools: [HaDeX](https://hadex2.mslab-ibb.pl/), [HRaDeX](https://hradex.mslab-ibb.pl/) and [compaHRaDeX](https://compahradex.mslab-ibb.pl/). 
+Here we describe the exemplary step-by-step analysis of experimental data using the hadexversum family tools: [HaDeX](https://hadex2.mslab-ibb.pl/), [HRaDeX](https://hradex.mslab-ibb.pl/) and [compaHRaDeX](https://compahradex.mslab-ibb.pl/). 
 
-The analysed protein is eEF1Bα subunit of the human guanine-nucleotide exchange factor (GEF) complex (eEF1B), measured in [Mass Spectrometry Lab](https://mslab-ibb.pl/) in [Institute of Biochemistry and Biophysics Polish Academy of Sciences](https://ibb.edu.pl/en/) and published by [Bondarchuk et al](https://doi.org/10.1093/nar/gkac685). In the one-state classification we will focus on pure alpha state, and in comparative analysis with regard to gamma state. 
+The analyzed protein is the eEF1Bα subunit of the human guanine-nucleotide exchange factor (GEF) complex (eEF1B), measured in [Mass Spectrometry Lab](https://mslab-ibb.pl/) in [Institute of Biochemistry and Biophysics Polish Academy of Sciences](https://ibb.edu.pl/en/) and published by [Bondarchuk et al](https://doi.org/10.1093/nar/gkac685). In the one-state classification, we will focus on pure alpha state. The comparative analysis is conducted between pure alpha state and alpha in presence of gamma component.  
 
 We present the visualization methods of hadexversum, without making strict interpretations. For that purpose, we suggest contacting the research group that published original research on this topic.
 
@@ -40,14 +40,14 @@ We present the visualization methods of hadexversum, without making strict inter
 
 ## General information 
 
-HaDeX is a general-use tool for widely understood analysis on the peptide level. Moreover, it provides many features for investigating directly the mass measurements and checking the experiment quality. The summary of the results is wrapped in a short and comprehensive report. HaDeX provdes many methods of quality control of the experiment with in-depth analysisc of measurements, uncertainty and statistical significance. Not only commonly used forms of vizualization are available, but also new methods are proposed. In this document, we focus on forms corrensponding with high resolution data.
+HaDeX is a general-use tool for widely understood data analysis on the peptide level. It provides many features for investigating directly the mass measurements and checking the experiment quality. HaDeX provides many methods of quality control of the experiment with in-depth analysis of measurements, uncertainty, and statistical significance. Not only commonly used forms of visualization are available, but also new methods are proposed. The summary of the results is wrapped in a comprehensive, downloadable report.
 
+In this document, we focus on visualization forms corresponding with high-resolution data.
 
 
 ## Peptide-level uptake analysis
 
-To see both the uptake level (with uncertainty of measurement) and the position of each peptide on the protein sequence, we use the comparison plot. For readability purposes, on this type of plot, we can present the data only for a single time point, but multiple biological states. However, a quick glimpse of the plot enables a general view of the exchanged regions. Let’s suppose we aim for the comparative analysis of two biological states. In that case, we use the so-called Woods plot, with differences in uptake for each peptide and information on which differences are statistically significant for the desired level. As for the comparison plot, we only present the data for a single time point. 
-
+To see both the uptake level (with uncertainty of measurement) and the position of each peptide on the protein sequence, we use the comparison plot. For readability purposes, on this type of plot, we can present the data only for a single time point. However, a quick glimpse of the plot enables a general view of the exchanged regions. 
 ```{r include=F}
 uptake_dat <- create_uptake_dataset(alpha_dat,
                                     states = c("Alpha_KSCN", "ALPHA_Gamma"),
@@ -63,6 +63,9 @@ HaDeX::plot_state_comparison(uptake_dat,
 ```
 
 
+Let’s suppose we aim for the comparative analysis of two biological states. In that case, we use the so-called Woods plot, with differences in uptake for each peptide and information on which differences are statistically significant for the desired level. As for the comparison plot, we only present the data for a single time point. 
+
+
 ```{r fig.width=7}
 diff_p_uptake_dat <- create_p_diff_uptake_dataset(alpha_dat,
                                                   state_1 = "Alpha_KSCN",
@@ -76,13 +79,14 @@ HaDeX::plot_differential(diff_p_uptake_dat = diff_p_uptake_dat,
 ```
 
 
-This plot presents the results for the measurement done after 150 min of exchange. It shows one significant exchange region - between positions 25 and 80 and two regions with values barely above the significance level.
+This plot presents the results for the measurement done after 150 min of exchange. It shows one significant exchange region - between positions 5 and 50 and two regions with values barely above the significance level.
 
 
 # HRaDeX
 
 ```{r include = FALSE}
 
+protein_length <- 224 + 1 ## r3dmol counts differently
 
 kin_dat <- HRaDeX::prepare_kin_dat(alpha_dat, 
                                    state = "Alpha_KSCN",
@@ -97,22 +101,23 @@ fit_values <- create_fit_dataset(kin_dat,
 
 hires_params <- calculate_hires(fit_values,
                                 fractional = T, 
-                                method = "weighted")
+                                method = "weighted",
+                                protein_length = protein_length)
 
 
 ```
 
-## General infromation 
+## General information 
 
-HRaDeX provides classification results for one biological state at a time. To get data for comparative purposes, the classification process should be conducted twice, on selected states, with the same classification parameters. Adjusting the parameters can be challenging, especially for longer proteins due to the calculation time. In this document we discuss the results, and the detailed description of the workflow is available in dedicated article.
+HRaDeX provides classification results for one biological state at a time. To get data for comparative purposes, the classification process should be conducted twice, on selected states, with the same classification parameters. Adjusting the parameters can be challenging, especially for longer proteins due to the calculation time. In this document, we discuss the results. The detailed description of the workflow is available in the dedicated article.
 
 ## High-resolution dynamics analysis
 
-First, we upload the experimental data. The parameter options are adjusted to the content of the file.
+Firstly, we upload the experimental data. The parameter options are adjusted to the content of the file.
 
-Then, we need to decide if the default parameters are sufficient. Of course, they can be adjusted in an interactive mode. Anyway, additional knowledge about the specificity of analyzed protein is helpful. Some of the peptides have a strong “medium” exchange phase shifted towards default “slow” exchange, with “slow” exchange being very slow, close to the bottom limit of class exchange. In such cases, the broadening of the medium class is desired. 
+Then, we need to decide if the default parameters are sufficient or should be adjusted. Any additional knowledge about the specificity of analyzed protein is helpful. Some of the peptides have a strong “medium” exchange phase shifted towards default “slow” exchange, with “slow” exchange being very slow, close to the bottom limit of class exchange. In such cases, the broadening of the medium class is desired. 
 
-In the case of our example, we use the default limits, as they are sufficient and the fit results are very good, with small rss. Default parameters are as follows:
+In the case of our example, we use the default limits, as they are sufficient and the fit results are very good, with small rss. The default parameters are as follows:
 
 ```{r}
 get_example_fit_k_params()
@@ -123,14 +128,15 @@ All parameters must be confirmed by clicking the button, to avoid unnecessary ca
 
 After a while, we have the results. 
 
-Let’s start with discussing the fitting results for example peptide - peptide IAAQYSGAQ from the example gamma data.
-Below, there is a plot with two parts on the left, there is normalised uptake curve with fitted model and logarythmic x scale. On the right theres is the same uptake curve, but without normalization and with normal x scale, for better uderstanding of the uptace pattern.
+Let’s start with discussing the fitting results for example peptide - peptide DVAAF from the alpha protein.
+
+Below, there is a plot with two parts - a normalized uptake curve with a fitted model and the uptake curve only with measurements, for a better understanding of the uptake pattern.
 
-Left plot: Measurement points are marked by circles, with the uncertainty of the measurement shown by the error bars. Mass spectrometry is a very accurate method, and the error bars are hardly visible, although present. The black line indicates the final fitted curve, with color lines indicating the three components of the final model. As described before, the red line presents the fast component, the green line is the medium exchange component, and the blue line is the slow component. Although all populations sum up to one, each population has its intensity that impacts the final classification. 
+Let's look closely on the left plot. Measurement points are marked by circles, with the uncertainty of the measurement shown by the error bars. Mass spectrometry is a very accurate method, and the error bars are hardly visible, although present. The black line indicates the final fitted curve, with color lines indicating the three components of the final model. As described before, the red line presents the fast component, the green line is the medium exchange component, and the blue line is the slow component. Although all populations sum up to one, each population has its intensity that impacts the final classification. 
 
 ```{r  include = FALSE}
-example_fit_dat <- filter(fit_values, id == 112)
-example_kin_dat <- filter(kin_dat, ID == 112)
+example_fit_dat <- filter(fit_values, id == 106)
+example_kin_dat <- filter(kin_dat, ID == 106)
 
 ```
 
@@ -139,13 +145,12 @@ plot_double_uc(example_kin_dat, example_fit_dat)
 
 ```
 
-
-The model parameters are shown below, and the resulting classficiation color is below the table.
+The model parameters are shown below, and the resulting classification color is below the table.
 
 ```{r}
 example_fit_dat
 ```
-As we can see, the population of the fast exchanging group is the biggest, thus the final color is close to red. However, the other groups are present and interfere with the purity of the color. The noticable slow exchaning group is pushing the classification color towards blue, resulting in violet-ish shade.  The small addition of gree leads to the subdued color. Below you can find a legend, to have an understanding where in the color scale is located this classification result.
+ As we can see, the population of the fast-exchanging group is the biggest, thus the final color is close to red. However, the other groups are present and interfere with the purity of the color. The noticeable slow exchanging group is pushing the classification color towards blue, resulting in a violet-ish shade.  The small addition of green leads to the subdued color. Below you can find a legend, to have an understanding of where in the color scale this classification result is located.
 
 
 ```{r, fig.width=2, fig.height=2, echo=F}
@@ -172,19 +177,19 @@ knitr::include_graphics("figures/rgb_plaster.png")
 
 
 
-After each peptide is assigned color code, we aggregate the data to obtain the simplified high-resolution result. For each residue in the protein structure, we aggregate the values using selected method. The methods are described in the article discussing the workflow. Here, we use the "weighted" approach.
-
-The classification of the whole sequence is presented below.
-
+After each peptide is assigned a color code, we aggregate the data to obtain the simplified high-resolution result. This mid-step towards high-resolution is also used as method verification. In this case, we see that the peptides in regions are classified similarly, and the data aggregation is justified. 
 
 ```{r}
-HRaDeX::plot_hires(hires_params)
+HRaDeX::plot_cov_class(fit_values)
 ```
 
-If we want to investigate on deep-level the quality of fittness, we can check the intermediate results. Here, we present the fitting results on the coverage plot. The classficiation results are consistent in regions, therefore the aggregation is justified.
+For each residue in the protein structure, we aggregate the values using the selected method. The methods are described in the article discussing the workflow. Here, we use the "weighted" approach.
+
+The classification of the whole sequence is presented below.
+
 
 ```{r}
-HRaDeX::plot_cov_class(fit_values)
+HRaDeX::plot_hires(hires_params)
 ```
 
 However, presenting the classification results in a linear way is not quite satisfying. Adding the spatial information, obtained from different sources, provides additional depth to our analysis. 
@@ -217,12 +222,13 @@ fit_values_2 <- create_fit_dataset(kin_dat_2,
 
 hires_params_2 <- calculate_hires(fit_values_2,
                                 fractional = T, 
-                                method = "weighted")
+                                method = "weighted",
+                                protein_length = protein_length)
 ```
 
 ## High-resolution comparative analysis
 
-The ultimate goal of the experiment is usually the comparative analysis between two biological states that provides information on how the exchange is changed by specific factors. In this case, we prepared a classification analysis for two biological states of alpha: the first state (discussed above is gamma without complex) and the second state (gamma in the presence of alpha).
+The ultimate goal of the experiment is usually the comparative analysis between two biological states that provides information on how the exchange is changed by specific factors. In this case, we prepared a classification analysis for two biological states of alpha: the pure state (discussed above is alpha without complex) and the second state (alpha in the presence of gamma).
 
 Below we present the classification results for both states, the first one on the bottom and the second one on the top. We can see with bore eye the regions of difference.
 
@@ -238,7 +244,8 @@ two_states <- HRaDeX::create_two_state_dataset(hires_params, hires_params_2)
 HRaDeX::plot_color_distance(two_states)
 ```
 
-In this case we see great difference in region 30-80 of the sequence, second region 175-180 and third 225-235, roughly estimating. Choosing 0.2 as the threshold of distance value, we can present the regions of difference on the 3D structure, as presented below.
+In this case, we see a great difference in region 10-50 of the sequence, second region 140-155, and third 200-210, roughly estimating. Choosing 0.2 as the threshold of distance value, we can present the regions of difference on the 3D structure, as presented below.
+
 ```{r}
 color_positions <- HRaDeX::prepare_diff_data(two_states,
                                              "dist",
@@ -250,15 +257,14 @@ HRaDeX::plot_3d_structure_blank(pdb_file_path = "../data/Model_eEF1Balpha.pdb")
 ```
 
 
-As the distance between populations plot shows us the regions of interest, doesn't show the direction of change - if the region is protected from exchange or the contrary. To account for that, we propose the rough estimate of exchange rate based on the parameters of the model, as defined in the workflow description article.
+As the distance between populations plot shows us the regions of interest, doesn't show the direction of change - if the region is protected from exchange or the contrary. To account for that, we propose a rough estimate of the exchange rate based on the parameters of the model, as defined in the workflow description article.
 
 ```{r}
 HRaDeX::plot_k_distance(two_states)
 ```
 
 
-
-We can see the obvious difference in the first part of the protein, in the same region as shown in the Woods plot above. We also see the small difference from Woods plot in the second part of the protein. Although the results are somehow analogical, the high-resolution approach accounts for the whole time course. 
+We can see the obvious difference in the first part of the protein, in the same region as shown in the Woods plot above. We also see a small difference from Woods's plot in the second part of the protein. Although the results are somehow analogical, the high-resolution approach accounts for the whole time course. 
 
 # Availability