TotalWorkflow.Rmd

---
title: "Total workflow"
author: "Jo-Hannes Nowé"
date: "05-04-2024"
output: 
  html_document:default
---

Before the workflow is started, we prepare the folder structure and download the necessary packages. 

```{r script0_1}
source('model_training/0_1Preparation.R')
```
# Defining the study area
The proposed study area concerns ospar regions II, III 
More information about these regions can be found on: https://www.ospar.org/convention/the-north-east-atlantic.
There are observations in ospar regions I and IV, however when working at a monthly
resolution, the spatial extent of the points is rather limited and these two ospar regions
lack points in multiple year_months. 

As said before we only retain region II and III in order to not interpolate
our predictions to the other areas based on the few occurrences. 

# Download Occurrence data from eurOBIS
Before filtering the columns of interest and doing some feature engineering we create the
metadatalist, giving information about the different datasets used for the presence data.

Observations already go through a couple of data quality controls before entering
the (eur)OBIS datasets. The biggest remaining issue are duplicates. These can be 
removed by using the **Coordinatecleaner** package. We consider distinct observations
with the same longitude, latitude, species and date as duplicates. Only the first
observation of each duplicate is kept in the dataset.

```{r script1_1}
region <- c('II','III')
aphia_id <-126415   #this is alosa fallax phocoena = 137117
word_filter <- c("stranding", "museum")
#dasid_filter <- c()
date_start <- "1999-01-01T00:00:00"
date_end <- "2019-01-01T00:00:00"
temporal_extent <- lubridate::interval(date_start,date_end)
source("1_1DownloadPresenceData.R")

#Outputs
##presence.rds
##spatial_extent
```

```{r script1_2}
#This script is not finished
#Output
##plots regarding the presence data
```

```{r script1_3}
species <- worrms::wm_id2name(aphia_id)
#name2id also works, can provide the option to choose
classification <- wm_classification(aphia_id) 
class <- which(classification[,2]=="Class")
target_group <- classification[[class,3]]

#load outputs script1_1
load(file.path(datadir,"alldataset_selection.RData"))
load(file.path(datadir,"mydata_eurobis.RData"))
load(file.path(datadir,"spatial_extent.RData"))
load(file.path(datadir,"ospar.RData"))

source("habitat_suitability_model/model_training/1_3AbsenceCreation.R")
#output
##absence.RData
```

```{r script2}
#Needs to be completed
source("habitat_suitability_model/model_training/2_PAExploration.R")
```

```{r script3_1}
source("habitat_suitability_model/model_training/3_1CopernicusMarine.R")
#Output
##Netcdf files of the right variables, temporal extent, spatial extent
```

```{r script3_2}
temporal_extent
spatial_extent
#Some parts need to be standardized, but in theory doesn't need specific things that are variable for each different run
source("habitat_suitability_model/model_training/3_2CleanExtractEnvLayer.R")
```

```{r script4_1}
source("habitat_suitability_model/model_training/4_1EnsembleApproach.R")
```
When evaluating the model we need to ask the following questions: how robust is the model to departures from the assumptions? How meaningful are the evaluation metrics used? How predictive is the model when tested against independent data? The performance of a model can be assessed from many different perspectives: realism, accuracy and generality.

Realism is the ability of a model to identify the critical predictors directly affecting the system and to characterize their effects and interactions appropriately.

Accuracy is the ability of the model to predict events correctly within the system being modeled (e.g species distributions in the same space and time as the input data).

Generality is the ability of the model to predict events outside of the modeled system via projection or transfer to a different resolution, geographic location, or time period. 


```{r script4_2}
#Needs to be finalized, no variables that change during runs
```

```{r script5}
model_object <- paste0("stacked_model_",aphia_id,".rds")
#Make it so that there can be choices in which monthly layers predictions are made on
#Bring back the option of assessing different time periods
source("habitat_suitability_model/model_training/5_MappingPredictions.R")
```