-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdata-analytics-for-soil-spectroscopy.Rmd
2814 lines (2379 loc) · 140 KB
/
data-analytics-for-soil-spectroscopy.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "A Course on Applied Data Analytics for Soil Analysis with Infrared Spectroscopy"
subtitle: Food and Agriculture Organization of the United Nations
date: "`r Sys.Date()`"
title-block-banner: true
date-format: long
output:
html_document:
highlight: pygments
self-contained: true
theme: cerulean
mathjax: default
toc: yes
toc_depth: 2
toc_float:
collapsed: false
fig_caption: true
number_sections: true
bookdown::pdf_document2:
highlight: haddock
keep_tex: true
bookdown::word_document2:
number_sections: true
header-includes:
- \usepackage{fancyhdr}
- \addtolength{\headheight}{0.5cm}
- \pagestyle{fancy}
- \renewcommand{\headrulewidth}{1pt}
- \usepackage{xcolor}
- \usepackage{framed}
- \usepackage{float}
- \usepackage{cancel}
- \floatplacement{figure}{H}
- \colorlet{shadecolor}{gray!08}
- \renewenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
- |
\usepackage{authblk}
\author[1]{Alexandre M.J.-C. Wadoux}
\author[2]{Leonardo Ramirez-Lopez}
\author[3]{Yufeng Ge}
\author[4]{Issam Barra}
\author[4]{Yi Peng}
\affil[1]{LISAH, Univ Montpellier, AgroParisTech, INRAE, IRD, L'Institut Agro, Montpellier, France}
\affil[2]{Data Science Department, BUCHI Labortechnik AG, Meierseggstrasse 40, 9230 Flawil, Switzerland}
\affil[2]{Imperial College London, Imperial Business School, South Kensington, Exhibition Rd, London SW7 2AZ, United Kingdom}
\affil[3]{Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA}
\affil[4]{Global Soil Partnership, FAO, Roma, Italy}
\date{\today}
bibliography: references.bib
link-citations: true
description: "FAO tutorial document on soil spectroscopy"
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE}
library(knitr)
if (knitr::is_html_output() || knitr::is_latex_output()) {
if (!is.null(knitr::opts_chunk$get("fig.align"))) {
knitr::opts_chunk$set(fig.align = 'center')
}
}
```
> Required citation: FAO. 2025. A course on applied data analytics for soil analysis with infrared spectroscopy. Rome.
<blockquote style="background-color:#f0f0f0; padding:15px; border-left:5px solid #ccc;">
<p><strong>Licensing Information</strong></p>
<p>The designations employed and the presentation of material in this information product do not imply the expression of any opinion whatsoever on the part of the Food and Agriculture Organization of the United Nations (FAO) concerning the legal or development status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The mention of specific companies or products of manufacturers, whether or not these have been patented, does not imply that these have been endorsed or recommended by FAO in preference to others of a similar nature that are not mentioned.</p>
<p>The views expressed in this information product are those of the author(s) and do not necessarily reflect the views or policies of FAO.</p>
<p><strong>ISBN xxx</strong></p>
<p>© FAO, 2025</p>
<p>Some rights reserved. This work is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 IGO licence (CC BY-NC-SA 3.0 IGO;
<a href="https://creativecommons.org/licenses/by-nc-sa/3.0/igo/legalcode" target="_blank">https://creativecommons.org/licenses/by-nc-sa/3.0/igo/legalcode</a>).</p>
<p>Under the terms of this licence, this work may be copied, redistributed and adapted for non-commercial purposes, provided that the work is appropriately cited. In any use of this work, there should be no suggestion that FAO endorses any specific organization, products, or services. The use of the FAO logo is not permitted. If the work is adapted, then it must be licensed under the same or equivalent Creative Commons licence. If a translation of this work is created, it must include the following disclaimer along with the required citation:</p>
<p><em>“This translation was not created by the Food and Agriculture Organization of the United Nations (FAO). FAO is not responsible for the content or accuracy of this translation. The original [Language] edition shall be the authoritative edition.”</em></p>
<p>Disputes arising under the licence that cannot be settled amicably will be resolved by mediation and arbitration as described in Article 8 of the licence except as otherwise provided herein. The applicable mediation rules will be the mediation rules of the World Intellectual Property Organization (<a href="http://www.wipo.int/amc/en/mediation/rules" target="_blank">WIPO Mediation Rules</a>), and any arbitration will be conducted in accordance with the Arbitration Rules of the United Nations Commission on International Trade Law (UNCITRAL).</p>
<p><strong>Third-party materials.</strong> Users wishing to reuse material from this work that is attributed to a third party, such as tables, figures, or images, are responsible for determining whether permission is needed for that reuse and for obtaining permission from the copyright holder. The risk of claims resulting from infringement of any third-party-owned component in the work rests solely with the user.</p>
<p><strong>Sales, rights and licensing.</strong> FAO information products are available on the FAO website (<a href="http://www.fao.org/publications" target="_blank">www.fao.org/publications</a>) and can be purchased through <a href="mailto:[email protected]">[email protected]</a>. Requests for commercial use should be submitted via: <a href="http://www.fao.org/contact-us/licence-request" target="_blank">www.fao.org/contact-us/licence-request</a>. Queries regarding rights and licensing should be submitted to: <a href="mailto:[email protected]">[email protected]</a>.</p>
</blockquote>
<blockquote style="background-color:#f0f0f0; padding:15px; border-left:5px solid #ccc;">
<p><strong>Editing and publication</strong></p>
<p>Matteo Sala (FAO-GSP)</p>
<p>Andy Murray (FAO-GSP)</p>
</blockquote>
```{r, echo=FALSE, results='asis'}
if (!knitr::is_latex_output()) {
cat("---\n")
cat("author:\n")
cat(" - name: \"Alexandre M.J.-C. Wadoux\"\n")
cat(" affiliation: \"LISAH, Univ Montpellier, AgroParisTech, INRAE, IRD, L'Institut Agro, Montpellier, France\"\n")
cat(" email: \"[email protected]\"\n")
cat(" - name: \"Leonardo Ramirez-Lopez\"\n")
cat(" affiliation: \"Data Science Department, BUCHI Labortechnik AG, Meierseggstrasse 40, 9230 Flawil Switzerland \n")
cat(" \n\n Imperial College London, Imperial Business School, South Kensington, Exhibition Rd, London SW7 2AZ, United Kingdom\"\n")
cat(" email: \"[email protected]\"\n")
cat(" - name: \"Yufeng Ge\"\n")
cat(" affiliation: \"Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA\"\n")
cat(" email: \"[email protected]\"\n")
cat(" - name: \"Issam Barra\"\n")
cat(" affiliation: \"Global Soil Partnership, FAO, Roma, Italy\"\n")
cat(" email: \"[email protected]\"\n")
cat(" - name: \"Yi Peng\"\n")
cat(" affiliation: \"Global Soil Partnership, FAO, Roma, Italy\"\n")
cat(" email: \"[email protected]\"\n")
cat("---\n")
}
```
<style type="text/css">
h1.title {
font-size: 40px;
text-align: center;
font-weight: bold;
}
h3.subtitle {
font-size: 28px;
text-align: center;
}
</style>
# Executive summary
This document provides a comprehensive guide to the process of soil spectroscopic modelling, from data acquisition to interpretation, with a focus on using R for analytical modelling. It covers the entire workflow involved in generating reliable predictions of soil properties from spectral data, including data preprocessing, exploratory analysis, calibration, model training, and validation. Practical examples in R are provided throughout, illustrating key steps such as spectral transformation, principal component (PC) analysis, and multivariate regression techniques for predictive modelling. These steps help convert raw spectral data into actionable insights, allowing users to estimate soil attributes like organic carbon (C), clay content, and mineral composition. The document introduces well-known mathematical transfer functions, which correlate spectral wavelengths to soil properties, and how these models can be calibrated using conventional laboratory analysed reference data. Additionally, it explores the challenges of interpreting complex infrared spectra, especially the overlapping absorption bands in the vis-NIR region, and provides strategies to overcome these difficulties. It emphasizes the use of chemometric tools, such as partial least squares (PLS) regression, to extract meaningful relationships between spectral features and soil attributes. This document is the second in a series of three training materials designed to guide users through basic, intermediate, and advanced topics in soil vis-NIR and MIR spectroscopy. The content is aimed at equipping researchers, practitioners and soil laboratory technicians with the knowledge and tools needed to apply spectroscopic techniques for soil analysis, with a focus on practical implementation using R.
# Introduction
**What is soil spectroscopy?**
Soil spectroscopy is concerned with the qualitative and quantitative estimation of soil compounds with spectroscopic data. The visible and near-infrared (vis-NIR) and mid-infrared (MIR, also widely known simply as infrared or IR) regions of the electromagnetic spectrum are gaining interest in soil research. This is because many soil physical, chemical, and biological properties have distinct spectral signatures in these regions. These signatures are characterized by specific peaks, where the intensity is related to a soil compound at a specific band or a complex combination of spectral bands. Chemometric techniques are employed to model the relationships between these complex signatures and soil properties, and yield predictive models for the properties of interest. The modelling results and estimates of soil properties can be used in a series of applications.
Soil spectroscopy has several advantages over conventional laboratory
methods for the estimation of soil attributes. These benefits have been
documented extensively in the past years. Spectral measurements of soil
are relatively rapid and inexpensive, and enable the estimation of several
important soil attributes from a single spectral reading. For these
reasons, spectroscopic analysis is usually more cost-effective than conventional
laboratory methods. Further, preparing of soil samples for spectroscopy
requires minimal efforts (i.e., usually 2 mm
sieving and grinding for homogeneous particle size) and avoids the extensive use of chemical reagents
needed for soil analysis, thus mitigating the overall environmental
impact of a soil survey. Methods for soil spectral analysis are now being applied in the field through the use of proximal soil sensors, enabling *in-situ*
soil scanning.
Figure \@ref(fig:spectrumwavelength) shows the spectrum for all possible
frequencies of electromagnetic radiation and the corresponding
wavelength values. The electromagnetic spectrum ranges from high
frequency $\gamma$ rays ($10^{34}$ Hz or wavelengths shorter than $10^{-13}$
m) to radiowaves, which have long wavelength ($10^8$ m) and low
frequency (10 Hz). Commonly, the ultraviolet, visible, and infrared
ranges are expressed in wavelength with unit micrometer
($\mathrm{\mu}$m) or nanometer (nm), whereas it is expressed in energy
units, usually wavenumber in centimeter (cm$^{-1}$), for the mid- and
far-infrared range. In the infrared ($0.75-1,000$ $\mathrm{\mu}$m;
$750-10^{6}$ nm; $12,500-10$ cm$^{-1}$) range, the radiation interacts
with molecules contained in the soil material, causing the bonds to
vibrate in different modes (such as stretching, bending, or rotating), which provide information on the soil molecular
composition. For example, methyl and methylene groups
in soil have a distinctive absorption area at $2,920$ cm$^{-1}$, which
corresponds to the infrared radiation absorbed by the aliphatic
asymmetric C-H stretch. Directed on a soil sample, the absorbed spectra
at different wavelengths produce a characteristic shape which relates to
the soil sample compounds and can be used for analytical purposes. A
spectrum of a soil sample is therefore generated by recording the amount
of radiation reflected by the sample material for all wavelengths in the
region of study (e.g., the NIR). At those wavelengths where the signal
is absorbed, the reflectance ($R$) radiation decreases. It is
usual to transform the reflectance to apparent absorbance ($A$) through
the relation $A = \log_{10}(1/R)$. For more information, the reader is
redirected to the Soil Spectroscopy Training Manual 1 [@ge2022primer].
```{r spectrumwavelength, out.width = '85%', echo=FALSE, fig.cap= "Electromagnetic spectrum, from @wadoux2021soil with permission.."}
knitr::include_graphics('inst/spectrum.png')
```
The near-infrared (NIR) range is dominated by overtones and combinations
(Figure 1.2) of fundamental vibrations that occur in the mid-infrared
(MIR) range. These fundamental vibrations are conventionally defined in the MIR as symmetric stretching ($\nu_1$), symmetric bending ($\nu_2$),
asymmetric stretching ($\nu_3$), and asymmetric bending ($\nu_4$). In the NIR, these vibrations are termed $\nu$ for stretching and $\delta$
for bending. Occasionally, features in the vis-NIR range arise due to electronic transitions (typically detecting
cations of metals such as Fe and Ti). Molecular vibrations in the MIR have a
range of progressively weaker vibrations and overtones in the NIR
region. This means that NIR spectra have fewer, broader, and overlapping
absorption bands than the MIR region. NIR spectra are thus more
difficult to interpret.
Typically, both the vis-NIR and MIR ranges contain information on
various organic and inorganic soil compounds. Soil carbon content (total
C, inorganic C, organic C) has absorption in the infrared resulting from
overtones of aliphatic C-H stretching, amide C=O stretching, C=H, and
aromatic C=C stretching, among others. However, identification of
several absorption features of organic functional groups is challenging
because they overlap with absorbance of mineral components, particularly
in the range 1400-800 cm$^{-1}$. Absorption of clay minerals occurs due
to hydroxyl O-H stretching and bending, while silicate Si-O stretching
overlaps with bands of organic functional groups. Overtones in the NIR
can result from fundamental vibrations of sulfate (SO4), carbonates
(CO3), and metal-OH bonds. Overall, infrared spectra are strongly
influenced by water, with absorption in the NIR due to a vibration of
H--O--H and in the MIR where absorbance of other soil compounds occurs
(e.g., at 3000 cm$^{-1}$). More details on absorption features are
provided in the Discussion.
```{r spectra, out.width = '90%', echo=FALSE, fig.cap= "Soil mid-infrared spectra (top) with the X-H stretch, triple bond (TB), double-bond (DB) and fingerprint regions and visible and near-infrared spectra (bottom) showing where the first, second, third overtones (OT) as well as combinations occur."}
knitr::include_graphics('inst/Fig2_spectra.jpeg')
```
The practical implications of the difference between vis-NIR and MIR
discussed in the Introduction is that scanning soil in the vis-NIR
range requires less sample preparation. As a result, vis-NIR spectrometry has
been effectively deployed *in-situ* in the field with miniaturized spectrometers. Although these field instruments can be influenced by environmental conditions, such as soil
moisture, they are not necessarily worse than laboratory spectrometers. Field
spectrometers have become routine and provide valuable assistance in soil monitoring
surveys. especially when a high density of soil observations is required. However, vis-NIR spectrometers are less sensitive to quartz (vis-NIR
spectrometers are often composed of quartz lenses), and MIR
spectrometers are to date almost exclusively restricted to use in the
laboratory.
Soil attributes have often complex absorption patterns in the infrared
range of the electromagnetic spectrum. The main frequencies of spectral
absorptions in the infrared are often non-specific (i.e., they are not
linearly related to a single soil property) and absorption bands overlap
between properties. This is particularly significant in the vis-NIR
range of the spectrum [@soriano2014performance] which is more
difficult to interpret than the MIR range. In addition, certain
properties such as quartz are insensitive in the vis-NIR range, and the
whole infrared range is compounded by scatter effect due to the surface
soil sample surface properties. Several soil attributes (oxides,
carbonates, organic matter, clay minerals) can be detected directly from
the spectra, due to the presence of specific absorption features, while
some other frequently reported properties (e.g., pH) are detected only
because of their association to a combination of spectrally active
components. To extract these complex absorption patterns and obtain
quantitative estimates of a soil property, **soil scientists have used
mathematical transfer functions to correlate spectral wavelengths to
soil attributes. The transfer function is calibrated using the spectral
wavelengths as independent variables and the laboratory measured values
of the soil attributes as the dependent variable. Once calibrated on the
spectra, the soil property can be predicted using the spectral
information only**.
```{r pipeline, echo=FALSE, fig.cap= "The pipeline to obtain quantitative information from soil infrared spectroscopic data."}
knitr::include_graphics('inst/workflow2.png')
```
The steps required for building a reliable spectroscopic model are shown
in Figure \@ref(fig:pipeline). Once the soil material is scanned, the
spectral and physical-chemical data are loaded into the analytics
software in the right and more convenient formats. Spectra need to be
pre-processed using mathematical transformations that aim to improve
their quality and remove undesirable noise. The pre-processed spectra
can then be visualized, and techniques such as principal component
analysis can be used for exploratory analysis. A subset of the soil
samples will be sent to the laboratory for further analysis, which can
be used for building the model. The modeling process includes data
partitioning, model training, and validation. Finally, the model results
are interpreted. All these steps are detailed in the following sections
of this manual and the code with data are available on the [FAO Github
repository](https://github.com/FAO-SID)
# Data loading and visualization
This manual includes `R` code to demonstrate various spectral data
analysis techniques and methodologies. However, it does not provide an
introduction to the `R` programming language itself. For those new to `R`,
we recommend consulting the open book "Introduction to R"
[@Douglas2024IntroR], which offers a comprehensive guide to getting
started with `R`. Additionally, for users interested in utilizing the
`tidyverse` collection of packages [@wickham2019welcome], which are widely used for data
manipulation and visualization, we suggest referring to the book "R for
Data Science" [@wickham2023r]. These resources will provide the basic
knowledge necessary to effectively understand and apply the `R` code
presented in this manual.
## Loading the data
Preparing and loading the data into the analytics software is an
important first step in our analysis. The way we prepare our data partially
determines how smooth the subsequent processes will be. Here is an
example of how we suggest to do that in `R`, using the
`dat2.csv` file which already contains soil physico-chemical data of several samples along with their respective spectra:
```{r myloadnir, echo = FALSE}
dat <- read.table('data/dat2.csv', sep = ",", header = TRUE)
```
```{r myloadnirnoread, eval = FALSE}
# Since the dat2.csv is separated by commas we use the option sep = ","
# if your data is separated by tab for example, you can use sep = "\t"
# NOTE: the function read.table will look into the current working directory.
# Make sure to have the working directory set to the correct path using the
# function setwd('your/path/to/file/')
dat <- read.table('dat2.csv', sep = ",", header = TRUE)
```
```{r myload, eval = FALSE}
# lets rename the first column to sample as it indicates the sample index
colnames(dat)[1] <- "sample"
# show how is the current table organized: first 10 rows and first 10 columns
head(dat, c(10, 7))
```
The table contains the sample name in the first column, followed by the
measured soil attributes `Organic_Carbon`, `Clay`, `Silt` and
`Sand`. The spectra follows, starting with the sixth column called
`X350`, which corresponds to the wavelength at 350 nm.
Once we upload the data, we can organize it in a way
that allows us to quickly access the soil attributes and spectra
separately. We recommend to create a `data.frame` where each
non-spectral variable can be accessed directly by its name and all the
spectral variables can be called at once by using one single name. Let's
see how this is done in `R`:
```{r spc-cols}
# This line searches for column names in the 'dat' dataset that match a specific pattern.
# The pattern "X[0-9]{2, 10}$" is a regular expression that:
# - "X": Matches the character 'X' at the start of each spectral column name.
# - "[0-9]{2, 10}": Matches between 2 and 10 digits (0-9) following the 'X'.
# - "$": Ensures that the match occurs at the end of the column name.
# The 'grep' function returns the indices of the column names that match this
# pattern.
# If in your data, the spectral variables start with another prefix, you should
# replace "X" with that prefix.
spectral_columns <- grep("X[0-9]{2,10}$", colnames(dat))
```
Now that we know what the columns corresponding to the spectral
variables are, we can just isolate the spectra into a matrix:
```{r props-and-spc}
my_spectra <- as.matrix(dat[, spectral_columns])
# rename the columns to remove the X prefix
colnames(my_spectra) <- gsub("X", "", colnames(my_spectra))
```
We can then readjust our original data as follows:
```{r props-and-spc2}
# Temporarily remove the spectral columns from the 'dat' dataset
dat <- dat[, -spectral_columns]
# Reassign the spectra to 'dat' as a single variable named 'spc_raw'
dat$spc_raw <- my_spectra
# finally remove the object "my_spectra"
rm(my_spectra)
```
With the above code, we have organized our data into a clean structure
where the non-spectral variables are directly accessible by their names
in the `dat` data frame, and the spectral data is consolidated into a
single variable (`spc_raw`). This structure facilitates further analysis by
allowing easy access to both types of data while keeping the dataset
well-organized.
An additional recommendation related to the code style, is that all
`R` object names and variables be assigned in lowercase letters. This
practice helps prevent typos related to case sensitivity, which can
sometimes lead to hard-to-detect code issues or bugs.
For the sake of demonstration, we also load MIR spectra from the same
dataset. Note here that the MIR spectra are loaded for visualization
purposes and not used hereafter. We use the same procedure as for the
vis-NIR data previously loaded.
```{r readmyloadmir, eval = FALSE}
dat_ir <- read.table('dat2MIR.csv', sep = ",", header = TRUE)
```
```{r readmyloadmirnoread, echo = FALSE}
dat_ir <- read.table('data/dat2MIR.csv', sep = ",", header = TRUE)
```
```{r myloadMIR}
colnames(dat_ir)[1] <- "sample"
spectral_columns_ir <- grep("X", colnames(dat_ir))
my_spectra_ir <- as.matrix(dat_ir[, spectral_columns_ir])
colnames(my_spectra_ir) <- gsub("X", "", colnames(my_spectra_ir))
dat_ir <- dat_ir[, -spectral_columns_ir]
dat$spc_raw_ir <- my_spectra_ir
rm(my_spectra_ir)
```
## Data visualization
We can visualize the spectra in `R` using various packages, the simplest
of which are the default `base` packages with the function `matplot`. An
alternative is to use the `ggplot2` package [@ggplot2], which provides more
functionalities to visualize data. We want to create a `ggplot` in `R`
where each row of the dataset is a spectrum and is represented as a
line, and the lines are colored based on the values of the column. One
can follow the steps below. This code uses the `ggplot2` package in `R`.
Note that the first step is to read the dataset and prepare it for
plotting. For illustration purposes we plot the vis-NIR and the MIR
spectra from the `dat2.csv` and `dat2Mir.csv` datasets loaded and
pre-processed in the previous section.
```{r specplot, fig.cap = "Spectra coloured by the content of organic carbon of the samples.", fig.width = 9, fig.height = 8, warning=FALSE, message=FALSE}
# Install patchwork and ggplot2 in case these libraries are not yet installed
# install.packages("ggplot2")
# install.packages("patchwork")
library(ggplot2)
library(patchwork)
# get the wavelengths of the spectra from the column names and convert them
# into numeric values
my_wavelengths <- as.numeric(colnames(dat$spc_raw))
# get the wavenumbers of the MIR spectra from the column names and convert them
# into numeric values
my_wavenumbers <- as.numeric(colnames(dat$spc_raw_ir))
# For an advanced-looking plot, we will use ggplot
# First we create the long format data frame:
# - 'sample' is repeated for each wavelength to maintain the sample-spectra
# relationship.
# - 'oc' (Organic Carbon) is repeated for each wavelength to be used as a color
# mapping in the plot.
# - 'wavelength' is repeated for each sample, representing the measurement
# points.
# - 'absorbance' is the spectral data values converted to a vector and
# transposed to align with 'sample' and 'wavelength'
# NOTE: here we convert the NIR spectra from reflectance to (apparent) absorbance
# as this is the conventional way the spectral information is presented
vnir_long <- data.frame(
sample = rep(1:nrow(dat), each = ncol(dat$spc_raw)),
oc = rep(dat$Organic_Carbon, each = ncol(dat$spc_raw)),
wavelength = rep(my_wavelengths, nrow(dat)),
absorbance = as.vector(t(log(1 / dat$spc_raw, 10)))
)
ir_long <- data.frame(
sample = rep(1:nrow(dat), each = ncol(dat$spc_raw_ir)),
oc = rep(dat$Organic_Carbon, each = ncol(dat$spc_raw_ir)),
wavenumber = rep(my_wavenumbers, nrow(dat)),
absorbance = as.vector(t(dat$spc_raw_ir))
)
# We an now use ggplot to create our spectral plot
# Create the first plot
plot_vnir <- ggplot(
vnir_long,
aes(x = wavelength, y = absorbance, group = sample, color = oc)) +
geom_line(alpha = 0.5) + # Set alpha to 0.5 for transparency
scale_color_gradient(low = "blue", high = "red") +
theme_minimal() +
labs(
x = "Wavelength (nm)",
y = "Calculated absorbace",
color = "Organic Carbon (%)"
) +
theme(legend.position = "top")
# Create the second plot
plot_ir <- ggplot(
ir_long,
aes(x = wavenumber, y = absorbance, group = sample, color = oc)) +
geom_line(alpha = 0.5) + # Set alpha to 0.5 for transparency
scale_color_gradient(low = "blue", high = "red") +
theme_minimal() +
labs(
x = expression(Wavenumber~(cm^{-1})),
y = "Calculated absorbace",
color = "Organic Carbon (%)"
) +
scale_x_reverse() +
theme(legend.position = "none")
# Combine the two plots vertically
combined_plot <- plot_vnir / plot_ir
combined_plot
```
The above code performs the following steps:
- It loads the dataset from `dat.csv`.
- It saves the `Organic_Carbon` column for later use in coloring the
lines and then removes this column from the dataset.
- It converts the dataset from a wide format to a long format, which
is necessary for plotting multiple lines with `ggplot2`.
- It uses the organic carbon values for color-coding the spectra.
- It creates a `ggplot` where each row (i.e. sample in this case) of
the dataset is represented as a line, with the lines colored based
on the organic carbon values. The colour gradient goes from blue
(low values) to red (high values).
# Data pre-processing
## Spectra pre-processing
Applying mathematical pre-processing functions on the raw spectra is the first step, and possibly the most important, in spectral data analysis with chemometrics. When scanning the soil, the soil sample volume and preparation, the measurement method, and measuring parameters such as the choice of scanning time and scanning resolution add noise to the recorded spectral data. Further, when scanning a material with non-uniform particle sizes such as soils, the reflectance spectrum is often accompanied by scattering noise. In fact, the recorded spectrum is not only the result of surface reflection (Fresnel reflection); soil samples absorb radiation that penetrates beneath the surface. As a result, the recorded spectrum depends on the relative amount of radiation absorption and surface reflectance, and lost radiation that is not reflected back to the sensor. In short, the radiation that hits the soil sample is partly reflected, partly absorbed, and partly transmitted. The implication of this phenomenon is that the recorded reflectance spectra do not directly relate to the absorption. A set of simple transformations can be applied to obtain the absorbance spectra. The most basic is the transformation of the reflectance spectral
matrix $R$ by $\log_{10}(1/R)$ or $-\log_{10}(R)$. These transformations
are embedded in all spectral data manipulation packages and software
implementations.
For more detailed information on common preprocessing techniques,
readers can refer to the documentation of the `prospectr` package
[@stevens2014introduction], which provides implementations for various
smoothing, scattering correction, and derivative methods for spectral
processing.
### Splice correction
Before applying standard preprocessing techniques, it is essential to
check if the spectrometer used to record the spectra produces splice
artifacts. These discontinuities often occur in spectrometers equipped
with multiple spectral detectors, such as those with separate detectors
for the visible and NIR regions. For instance, some sensors may produce
splices at 1000 nm and 1830 nm, leading to noticeable "steps" in the
data where measurements from different detectors are combined. These
steps can become problematic for modelling if not addressed, particularly
when derivatives are employed.
Splice correction is a critical process that addresses these artifacts
by linearly interpolating the values at the edges of splice regions,
ensuring a smooth transition between the outputs of different detectors. This
correction minimizes the impact of detector transitions on subsequent
analysis, resulting in continuous and accurate spectral data. The
`spliceCorrection` function in the `prospectr` package [@stevens2014introduction] allows users to
perform this correction by specifying the splice positions and the
number of interpolation bands needed for the desired adjustment.
Let's take another look at Figure \@ref(fig:specplot). If we closely
examine the region around the 1000 nm wavelength, we can observe a
nearly imperceptible artifact that appears as a subtle vertical line.
There is another, less visible artifact around the 1830 nm wavelength.
We anticipated these issues because we know that our data was collected
using a device with three detectors: the first records spectra up to
1000 nm, the second from 1001 nm to 1830 nm, and the third from 1831 nm
onwards.
```{r specplotsplice, fig.cap = "Splice-artifact-free spectra colored by the organic carbon content of the samples.", fig.width = 9, fig.height = 4, warning=FALSE, message=FALSE}
library(prospectr)
dat$spc <- spliceCorrection(dat$spc_raw, my_wavelengths, splice = c(1000, 1830))
spc_long <- data.frame(
sample = rep(1:nrow(dat), each = ncol(dat$spc)),
oc = rep(dat$Organic_Carbon, each = ncol(dat$spc)),
wavelength = rep(my_wavelengths, nrow(dat)),
reflectance = as.vector(t(dat$spc))
)
# We an now use ggplot to create our spectral plot
ggplot(
spc_long,
aes(x = wavelength, y = reflectance, group = sample, color = oc)) +
geom_line(alpha = 0.5) + # Set alpha to 0.5 for transparency
scale_color_gradient(low = "blue", high = "red") +
theme_minimal() +
labs(
x = "Wavelength (nm)",
y = "Reflectance",
color = "Organic Carbon (%)"
) +
theme(legend.position = "top")
```
Figure \@ref(fig:specplotsplice), shows that the artifact initially
observed in Figure \@ref(fig:specplot) has been removed.
### Smoothing/noise removal
Smoothing of the spectra is typically used to reduce noise.
Examples of methods to remove noise in the spectra include **moving window average** and **Savitzky-Golay filtering**. These techniques help
in smoothing out fluctuations and improving the clarity of the spectral
data. It should be noted that while smoothing is essential, it may need
to be carefully balanced to avoid losing important spectral features.
Additionally, **derivatives** can be applied to the spectra to
accentuate absorbance features and remove both additive and
multiplicative effects. However, derivatives tend to amplify noise, so
smoothing is often recommended before applying derivatives.
In certain cases, specific wavelength portions of the spectra may
exhibit a high signal-to-noise ratio, while the rest of the spectra show
a lower signal-to-noise ratio. In these scenarios, it may be beneficial
to simply remove those portions from our analysis. For example, the
edges of the spectra (typically between 400 nm to 500 nm and from 2450
nm to 2500 nm in the NIR range), tend to be noisy. Trimming these
regions can help improve subsequent spectra analyses.
**Moving window average**
The moving window average, also known as the rolling or running average,
is a simple smoothing technique used in signal processing. It involves
taking the average of a subset of data points within a specified window
that moves across the data set. For each position of the window, the
average of the data points within it is calculated, and this process is
repeated as the window moves through the entire dataset. This technique
helps in reducing noise and making the underlying trends in the data
more apparent.
**Savitzky-Golay Filtering**
The Savitzky-Golay filter is a signal filtering technique that applies
polynomial smoothing to data. Unlike the moving window average, which
replaces each data point with the average of its neighboring points, the
Savitzky-Golay filter fits successive subsets of adjacent data points
with a low-degree polynomial using linear least squares.
When the polynomial degree is zero, the Savitzky-Golay filter behaves
like the moving window average. The primary advantage of this method is
its ability to preserve important features of the data distribution,
such as relative maxima, minima, and peak width, which are often
"flattened" by other averaging techniques. The equation for the
Savitzky-Golay filter is:
$$
\hat{y}_{i} = \frac{1}{n}\sum_{j=-h}^{h}c_{j} \ y_{i+j},
$$
where $n$ is the number of points in the window, $\hat{y}_{i}$ is the
smoothed value of the spectrum at wavelength $i$, $y_{i+j}$ are the
spectrum values within the window centered at $i$, $h$ is half the width
of the window (i.e., the window size is $2h + 1$), and $c_{j}$ is the
coefficient of the polynomial.
Note that Savitzky-Golay filtering is also used for computing signal
derivatives. This is explained in the upcoming section on derivatives,
where the filter's ability to smooth data while preserving derivative
information is particularly valuable.
**Similarities between moving window average and Savitzky-Golay**
- **Noise reduction**: Both techniques are used to smooth data,
helping to reduce noise and reveal underlying trends.
- **Window-based**: They operate over a window that moves across the
data, applying a calculation to the data points within that window.
- **Preservation of data structure**: While the Savitzky-Golay filter
is particularly noted for its ability to preserve the shape of the
features of the dataset, both methods aim to maintain the integrity
of the original data as much as possible, albeit in slightly
different ways.
**Implementation in R**
The `prospectr` package in `R` is designed for the processing and
analysis of spectral data, including soil infrared spectroscopic data.
It includes functions for both moving window average and Savitzky-Golay
filtering.
To illustrate the effect of smoothing, let's use one single spectrum.
Let's take the first spectrum in our dataset and add some (normally
distributed) random noise:
```{r noisyspec, eval = TRUE, warning=FALSE, message=FALSE}
# Set a seed for reproducibility of random noise
set.seed(801124)
# Add random noise to the first sample's spectrum
my_noisy_spc <- dat$spc[1, ] + rnorm(ncol(dat$spc), sd = 0.0025)
```
Now let's apply the smoothing pre-treatmets to our noisy spectra:
```{r loadprospectr, eval = TRUE, warning=FALSE, message=FALSE}
library(prospectr)
# Replace 'my_noisy_spc' with your actual data frame or matrix and 'w' with the desired
# window size
mwa_result <- movav(my_noisy_spc, w = 11)
```
Now Savitzky-Golay filtering with `prospectr`:
```{r svprepro, eval = TRUE, warning=FALSE, message=FALSE}
# Replace 'my_noisy_spc' with your actual data frame or matrix, 'm' with the window size,
# and 'p' with the polynomial order
sg_result <- savitzkyGolay(
my_noisy_spc, # the noisy spectrum
m = 0, # this m is for the derivative order, 0 is for no derivative
p = 3, # the polynomial order
w = 11 # the window size
)
```
In the above examples, `my_noisy_spc` represents your input spectra,
i.e., the spectra that you want to smooth, `w` represents the window
size used for smoothing, and `p` represents the polynomial order used in
the Savitzky-Golay filter. Let us plot the different spectra to observe
the effects of smoothing:
```{r preproexample, fig.cap = "Example of smoothing a noisy spectrum.", eval = TRUE, warning=FALSE, message=FALSE, fig.width = 8, fig.height = 4}
# Create a long-format data frame for the MVA spectrum
long_mwa_spc <- data.frame(
wavelength = as.numeric(names(mwa_result)), # Extract the wavelengths
spectral_value = mwa_result, # Store the spectral values from the MVA
method = "mva" # Label the data
)
# Create a long-format data frame for the Savitzky-Golay (SG) spectrum
long_sg_spc <- data.frame(
wavelength = as.numeric(names(sg_result)), # Extract the wavelengths
spectral_value = sg_result, # Store the spectral values from the SG
method = "sg" # Label the data
)
# Combine the long-format MVA and SG data frames into one
denoised_long <- rbind(long_mwa_spc, long_sg_spc)
# Create a data frame for the original noisy spectrum
original_noisy <- data.frame(
wavelength = my_wavelengths, # Use the original wavelengths
spectral_value = my_noisy_spc # Use the original noisy spectrum values
)
my_labels <- c(
"mva" = "Moving average smoothing",
"sg" = "Savitzky-Golay filtering"
)
# Plot the spectra using ggplot2
ggplot() +
# Plot the original noisy spectrum as a red line in the background
geom_line(
data = original_noisy,
aes(x = wavelength, y = spectral_value),
color = "black"
) +
# Plot the MVA and SG processed spectra on top of the original noisy spectrum
geom_line(
data = denoised_long,
aes(x = wavelength, y = spectral_value, color = method, group = method)
) +
# Add titles and labels for the axes
labs(
x = "Wavelength (nm)",
y = "Reflectance"
) +
# Manually set the colors for the MVA and SG lines
scale_color_manual(
values = c("mva" = "#00AFBB", "sg" = "#E7B800"),
labels = my_labels
) +
# Facet the plot by the method, creating separate plots for MVA and SG
facet_grid(. ~ method, labeller = labeller(method = my_labels)) +
theme(legend.position = "top") + guides(color = guide_legend(title = NULL))
```
With Figure \@ref(fig:preproexample), which illustrates the effects of
different preprocessing methods, we can clearly see that with the
previous code, the noise pretreated spectra becomes less noisy than the
original spectrum. Figure \@ref(fig:preproexample) shows three spectra:
the original spectrum (black line), the MVA-smoothed spectrum (cyan
line), and the SG-smoothed spectrum (yellow line). The x-axis represents
the wavelength (0 to 2500 nm), and the y-axis shows the value (0 to
0.4).
When applying smoothing techniques like moving average (MVA) and
Savitzky-Golay (SG) filtering, selecting the appropriate window size and
polynomial order is of critical importance. Large window sizes can lead
to over-smoothing, flattening the signal and potentially removing
valuable soil information. Conversely, smaller window sizes preserve
more details but might not adequately reduce noise. Higher polynomial
orders in SG filtering can capture more complex trends but may also fit
the noise, making the filtering less effective. On the other hand, low
polynomial orders combined with large window sizes might overly smooth
the data, suppressing important spectral information.
To balance noise reduction and signal preservation, typical window sizes
range from 5 to 15 nm, and polynomial orders from 2 to 6, with 3 being
common for soil spectroscopy. Experimentation and cross-validation are
often needed to determine the optimal parameters for your dataset.
### Scatter corrections
After noise removal, another important pre-processing step is scatter
correction, which aims at enhancing the signal and correct baseline
shifts that may occur due to physical characteristics of the soil, such
as particle size distribution. Scatter correction techniques are
designed to reduce variability between samples caused by scatter and
adjust for baseline shifts. The most commonly used methods for scatter
correction in NIR spectroscopy include **standard normal variate (SNV)** and **multiplicative scatter correction (MSC)**.
Standard Normal Variate (SNV) is a simple and widely
used method that works by centering each individual spectrum to zero and
then dividing each spectral band value by the standard deviation of the
entire spectrum. The SNV method processes each observation
independently. The disadvantage of SNV is that it can be sensitive to
noise.
Multiplicative Scatter Correction (MSC) is also a widely used
pre-processing technique which involves two main steps: estimating
correction coefficients (additive and multiplicative contributions) and
then applying these corrections to the spectrum or spectra to be
corrected [@rinnan2009review]. The purpose of MSC is to remove
undesirable scatter effects from the data matrix before further
modeling. The typical approach uses the average spectrum of the
calibration set as the reference spectrum, although a generic reference
spectrum can also be employed. MSC can be extended to include
higher-order polynomial fitting or incorporate a priori knowledge, but
in most practical applications, the basic form of MSC is sufficient
[@rinnan2009review]. This method effectively preserves spectral features
while reducing background offsets and slopes.
In many cases, the choice between MSC and SNV will depend on the
specific characteristics of the data and the intended use of the
processed spectra. Both techniques are highly effective at reducing
scatter-induced variability, and their application is often preceded by
smoothing methods like Savitzky-Golay filtering to ensure that the
underlying trends in the data are preserved. In addition, because MSC uses a mean or some reference spectrum, it might not be so suitable if one is performing operations across disparate spectral libraries, in which case SNV would be preferred.
**Implementation in R**
For SNV:
```{r snvexample, eval = TRUE, warning=FALSE, message=FALSE}
# Assume 'dat$spc' is your spectral matrix, where each row is a spectrum
# Apply SNV to each spectrum
snv_spectra <- standardNormalVariate(dat$spc)
```
For MSC:
```{r mscexample, eval = TRUE, warning=FALSE, message=FALSE}
# Assume 'dat$spc' is your spectral matrix, where each row is a spectrum
# Apply MSC to the spectra
msc_spectra <- prospectr::msc(dat$spc, ref_spectrum = colMeans(dat$spc))
```
### Spectral derivatives
Spectral derivatives are considered a standard tool in spectroscopic
analysis. They are used to remove both additive and multiplicative effects
in the spectra. They help enhance small spectral absorptions, resolve
overlapping absorption features, and compensate for baseline shifts and
instrument drift. However, derivatives can also lead to noise
amplification and make the spectral interpretation more complex, so a
careful application is necessary. Here we describe the basic method used
for computing the derivatives: the Savitzky-Golay method and the
Norris-Gap derivatives, referred hereafter to as Norris-Williams.
#### Savitzky-Golay derivatives
The Savitzky-Golay (SG) method is widely used for calculating spectral
derivatives. In this method, a polynomial is fitted to a subset of data
points within a moving window centered around each point in the
spectrum. The derivative of this (noise-free) polynomial is then
computed at the center of the window. This method allows for the
estimation of first or higher-order derivatives, preserving the
integrity of the spectral signal.
To apply the Savitzky-Golay method in `R` we can use the
`savitzkyGolay()` function from the `prospectr` package:
```{r sgdexample, fig.cap = "Example of smoothing a noisy spectrum.", eval = TRUE, warning=FALSE, message=FALSE}
# Apply Savitzky-Golay smoothing and compute the first derivative
sg_first_derivative <- savitzkyGolay(
dat$spc, # The spectral data
m = 1, # The derivative order (first derivative)
p = 2, # The polynomial order (2 i.e fitting a quadratic polynomial)
w = 11 # The window size
)
# Apply Savitzky-Golay smoothing and compute the second derivative
sg_second_derivative <- savitzkyGolay(
dat$spc, # The spectral data
m = 2, # The derivative order (second derivative)
p = 2, # The polynomial order (2 i.e fitting a quadratic polynomial)
w = 11 # The window size
)
```
In the above code, `p` represents the polynomial order, `m` is the
derivative order (with 1 for the first derivative and 2 for the second),
and `w` is the window size. The advantage of the Savitzky-Golay method
is that it smooths the data while computing derivatives.
#### Gap-Segment derivatives
This method is also known as Norris-Williams derivatives method. This is
another technique used to calculate spectral derivatives while managing
noise. This method involves two steps: smoothing the spectra over a
specified window and then calculating the derivative with a defined gap
between points. The gap size helps to reduce noise further, making the
method particularly useful for spectral data with high co-variation
[@rinnan2009review].
To apply the gap-Segment derivative method in `R` we can use the
`gapDer()` function from the `prospectr` package:
```{r gapsegdexample, fig.cap = "Example of smoothing a noisy spectrum.", eval = TRUE, warning=FALSE, message=FALSE}
# Apply gapDer smoothing and compute the first derivative
gap_first_derivative <- gapDer(
dat$spc, # The spectral data
m = 1, # The derivative order (first derivative)
w = 11, # The window size
s = 3 # The segment size
)
# Apply gapDer smoothing and compute the second derivative
gap_second_derivative <- gapDer(
dat$spc, # The spectral data
m = 2, # The derivative order (second derivative)
w = 11, # The window size
s = 3 # The segment size
)
```
Now let's plot the first and second derivatives of the first spectrum
from our dataset:
```{r derplots, fig.cap = "Examples of the derivatives for the first spectrum in the dataset.", eval = TRUE, warning=FALSE, message=FALSE, fig.width = 8, fig.height = 8}
library(ggplot2)
# Create the long-format data frame for the different derivatives
long_sg_1der_spc <- data.frame(
wavelength = as.numeric(colnames(sg_first_derivative)), # the wavelengths
spectral_value = sg_first_derivative[1, ], # Store the derivatives
method = "Savitzky-Golay derivative", # Label the data
derivative_order = "1st"
)
long_sg_2der_spc <- data.frame(
wavelength = as.numeric(colnames(sg_second_derivative)), # the wavelengths
spectral_value = sg_second_derivative[1, ], # Store the derivatives
method = "Savitzky-Golay derivative", # Label the data
derivative_order = "2nd"
)
long_gap_1der_spc <- data.frame(
wavelength = as.numeric(colnames(gap_first_derivative)), # the wavelengths
spectral_value = gap_first_derivative[1, ], # Store the derivatives
method = "Gap-Segment derivative", # Label the data
derivative_order = "1st"
)
long_gap_2der_spc <- data.frame(
wavelength = as.numeric(colnames(gap_second_derivative)), # the wavelengths
spectral_value = gap_second_derivative[1, ], # Store the derivatives
method = "Gap-Segment derivative", # Label the data
derivative_order = "2nd"
)
# Combine all derivatives into one data frame
derivatives_long <- rbind(
long_sg_1der_spc, long_sg_2der_spc, long_gap_1der_spc, long_gap_2der_spc
)
# Plot the derivatives using ggplot2 with facets for each method
ggplot(
derivatives_long, aes(x = wavelength, y = spectral_value, color = method)
) +
geom_line() +
labs(
x = "Wavelength (nm)",
y = "Derivative value"
) +
facet_wrap(derivative_order ~ method, scales = "free_y") +
theme(legend.position = "none") # Remove the legend as it's redundant with facets
```
Figure \@ref(fig:derplots) shows that while the Savitzky-Golay method
captures detailed spectral features, in this particular case it tends to
introduce more noise in the second derivative. The Gap-Segment method,
on the other hand, provides a cleaner result with less noise in both the
first and second derivatives, making it a better choice for applications
requiring noise reduction in higher-order derivatives.
## Resampling
Resampling is a technique used to adjust the spectral data to new
spectral band positions, either to increase or decrease the resolution
or to align the data with a different set of wavelengths. The `resample`
function in the `prospectr` package facilitates this process by using
either spline or linear interpolation methods. We can specify the
original band positions and the desired new band positions, with the
function returning a resampled matrix or vector that aligns with the new
spectral coordinates. This method is particularly useful when spectral
data needs to be standardized or harmonized for comparison across
different datasets or instruments.
For the rest of this tutorial we will resample the spectral data so they
have a resolution every 5 nm. This can be done with the `resample`
function:
```{r, eval = TRUE}
new_wavs <- seq(from = min(my_wavelengths), to = max(my_wavelengths), 5)
# resample and over-write the original spectral matrix
dat$spc <- resample(