ridl
is an R client for the UNHCR Raw Internal Data Library (RIDL)
platform.
The UNHCR RIDL platform is UNHCR internal platform to easily store, find and analyze raw data.
In order to easily use the ridl
package, it’s important to understand
some key concepts of this platform. RIDL documentation is available
here for more details.
A container
is a placeholder where we can share data on RIDL
. A
container
is represented in the ridl
package as a RIDLContainer
object and can hold zero or multiple datasets
.
Most functions are prefixed by ridl_container
or rc
- Get metadata of a container:
ridl_container_show
orrc_show
- Get the list of the names of all containers in the server:
ridl_container_list
orrc_list
A dataset
is a placeholder where we can share data files
(resources
). In a dataset page there’s some metadata that give you
enough context and information to properly store the data files and use
them. A data file e.g an Excel file is called a resource
and many of
them can be shared in dataset
page. In the ridl
package, a
RIDLDataset
object is used to represent a dataset
.
Most functions are prefixed by ridl_dataset
or rd
:
- Get metadata of a dataset and its resources:
ridl_dataset_show
orrd_show
- Get the list of the names of all datasets in the server:
ridl_dataset_list
orrd_list
- Check whether a dataset exists or not:
ridl_dataset_exist
orrd_exist
- Searches for datasets satisfying a given search criteria:
ridl_datasets_search
orrd_search
- Get the
Container
object in which you have the dataset:ridl_datasets_container_get
orrd_container_get
A resource
is a file shared in dataset
page, it includes microdata
and supporting documents like reports or survey forms. A RIDLResource
class is used to implement all the logic needed to manipulate RIDL
resource
.
Most functions are prefixed by ridl_resource
or rr
- Get metadata of a resource:
ridl_resource_show
orrr_show
This package is not on yet on CRAN and to install it, you will need the
remotes
package. You can get
ridl
from Gitlab or Github (mirror)
## install.packages("remotes")
remotes::install_gitlab("dickoa/ridl")
library("ridl")
The ridl
package requires you to add your API token and store it for
further use. It is preferred option, you no longer need to use the API
key. In order to have an API token, you need to generate one, by going
to the following this URL:
“ridl-server-url/user/@your-user-name/api-tokens”.
You give it a name, and generate a token. Once generated, you can store
it in your .Renviron
file which is automatically read by R on startup.
You can either edit directly the .Renviron
file or access it by
calling usethis::edit_r_environ()
(assuming you have the usethis
package installed) and entering:
RIDL_API_TOKEN=xxxxxxxxxxxxxxxxxx
Once the environment variable is set you will need to restart your session.
library("ridl")
ridl_config_get()
## <RIDL Configuration>
## RIDL site url: https://ridl.unhcr.org
## RIDL API token: xxxxxxxxxxxxxxxxxx
If you plan to use RIDL testing environment
(https://ridl-uat.unhcr.org), you’ll need to also setup the
RIDL_UAT_API_TOKEN
variable.
RIDL_UAT_API_TOKEN=xxxxxxxxxxxxxxxxxx
You can also configure directly the ridl
package using the
ridl_config_setup
function and check the config using
ridl_config_get
but it’s not persistent if you close your session.
ridl_config_setup(site = "test",
token = "xxxxxxxxxxxxxxxxxx")
ridl_config_get()
## <RIDL Configuration>
## RIDL site: https://ridl-uat.unhcr.org/
## RIDL API token: xxxxxxxxxxxxxxxxxx
Now that we are connected to RIDL, we can search for dataset using
ridl_dataset_search
.
ridl_config_setup(site = "prod")
ridl_dataset_search("mali", visibility = "public", rows = 2) ## search internally public dataset in RIDL, limit the results to two rows
## [[1]]
## <RIDL Dataset> 6f37029d-0ec2-4322-88ed-6447b2eebf3a
## Title: Socio-economic assessment of Malian refugees in Burkina Faso 2016
## Name: unhcr-bfa-2016-sea-1-1
## Visibility: public
## Resources (up to 5): DDI XML, DDI RDF, UNHCR_BFA_2016_SEA_household_v1_1, UNHCR_BFA_2016_SEA_individual_v1_1, UNHCR_BFA_2016_final report
## [[2]]
## <RIDL Dataset> 59573073-aef6-42c1-a9db-efae3f95051c
## Title: Socio-economic assessment of refugees in Mauritania's Mberra camp 2017
## Name: unhcr-mrt-2017-sea-1-1
## Visibility: public
## Resources (up to 5): DDI XML, DDI RDF, UNHCR_MRT_2017_SEA_household_v1_1, UNHCR_MRT_2017_SEA_individual_v1_1, UNHCR_MRT_2017_SEA_questionnaire
## attr(,"class")
## [1] "ridl_datasets_list"
We can select a particular dataset
from the list (a
ridl_dataset_list
is a list) of dataset
using R
function to access
elements from list (e.g [[
). In this example, we can use either
purrr::pluck
or dplyr::nth
since they both play well with the pipe
operator %>%
. Once the dataset selected, it’s possible to list all its
resource
objects using ridl_resource_list
.
library(tidyverse)
ridl_dataset_search("mali", visibility = "public", rows = 2) |>
nth(1) |>
ridl_resource_list(format = "stata")
## <RIDL Resource> 026f9547-d7b2-4ec3-bbaa-5096837b1f01
## Name: UNHCR_BFA_2016_SEA_household_v1_1
## Description: BFA SEA household level data
## Type: microdata
## Size: 1278720
## Format: Stata
## [[2]]
## <RIDL Resource> 30ab9f7a-9b84-4695-88ba-7504a4aed9e2
## Name: UNHCR_BFA_2016_SEA_individual_v1_1
## Description: BFA SEA individual data
## Type: microdata
## Size: 143744
## Format: Stata
## attr(,"class")
## [1] "ridl_resource_list"
A ridl_resource_list
is a simple R
list
and can be manipulated
using purrr::pluck
or dplyr::nth
to select the one you want to
read
into your R
session or download
.
library(tidyverse)
ridl_dataset_search("mali", visibility = "public", rows = 2) |>
nth(1) |>
ridl_dataset_resource_get_all(format = "stata") |>
nth(1) |>
ridl_resource_read()
## + # A tibble: 1,690 x 459
## hhid q002a q006 q008 q102 q113 q200 q201
## <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl> <dbl+l> <dbl+l>
## 1 10004 1 [Cam… 1 [Oui] 1 [Pré… 3 0 2 [Dou… 1 [For…
## 2 10008 1 [Cam… 1 [Oui] 1 [Pré… 3 1 2 [Dou… 1 [For…
## 3 10012 1 [Cam… 1 [Oui] 1 [Pré… 7 1 2 [Dou… 1 [For…
## 4 10016 1 [Cam… 1 [Oui] 1 [Pré… 2 1 2 [Dou… 1 [For…
## 5 10020 1 [Cam… 1 [Oui] 1 [Pré… 6 1 2 [Dou… 1 [For…
## 6 10024 1 [Cam… 1 [Oui] 1 [Pré… 3 1 2 [Dou… 1 [For…
## 7 10028 1 [Cam… 1 [Oui] 1 [Pré… 5 1 2 [Dou… 1 [For…
## 8 10032 1 [Cam… 1 [Oui] 1 [Pré… 7 1 2 [Dou… 1 [For…
## 9 10036 1 [Cam… 1 [Oui] 1 [Pré… 4 3 2 [Dou… 1 [For…
## 10 10040 1 [Cam… 1 [Oui] 1 [Pré… 2 1 2 [Dou… 1 [For…
## # … with 1,680 more rows, and 451 more variables:
## # q202 <dbl+lbl>, q203 <dbl>, q204 <dbl+lbl>, q205 <dbl+lbl>,
## # q206_1 <dbl+lbl>, q206_2 <dbl+lbl>, q206_3 <dbl+lbl>,
## # q206_4 <dbl+lbl>, q206_5 <dbl+lbl>, q206_6 <dbl+lbl>,
## # q207 <dbl+lbl>, q208 <dbl+lbl>, q209 <dbl+lbl>, q210 <dbl>,
## # q211 <dbl+lbl>, q21201 <dbl+lbl>, q21202 <dbl+lbl>,
## # q21203 <dbl+lbl>, q213 <dbl+lbl>, q214 <dbl>,
## # q215 <dbl+lbl>, q216 <dbl>, q217 <dbl+lbl>, q218 <dbl>,
## # q219 <dbl+lbl>, q220 <dbl+lbl>, q221 <dbl+lbl>,
## # q222 <dbl+lbl>, q223 <dbl+lbl>, q224 <dbl+lbl>, q225 <dbl>,
## # q226 <dbl+lbl>, q227 <dbl>, q22801 <dbl+lbl>,
## # q22802 <dbl+lbl>, q22803 <dbl+lbl>, q22804 <dbl+lbl>,
## # q22805 <dbl+lbl>, q22806 <dbl+lbl>, q22807 <dbl+lbl>,
## # q22808 <dbl+lbl>, q22809 <dbl+lbl>, q22810 <dbl+lbl>,
## # q22811 <dbl+lbl>, q229 <dbl+lbl>, q230 <dbl>,
## # q231 <dbl+lbl>, q232 <dbl>, q23301 <dbl+lbl>,
## # q23302 <dbl+lbl>, q23303 <dbl+lbl>, q23304 <dbl+lbl>,
## # q23305 <dbl+lbl>, q23306 <dbl+lbl>, q23307 <dbl+lbl>,
## # q23308 <dbl+lbl>, q23309 <dbl+lbl>, q23310 <dbl+lbl>,
## # q23311 <dbl+lbl>, q234 <dbl+lbl>, q23501 <dbl+lbl>,
## # q23502 <dbl+lbl>, q23503 <dbl+lbl>, q23504 <dbl+lbl>,
## # q23505 <dbl+lbl>, q23506 <dbl+lbl>, q23507 <dbl+lbl>,
## # q23508 <dbl+lbl>, q23509 <dbl+lbl>, q23510 <dbl+lbl>,
## # q23511 <dbl+lbl>, q23512 <dbl+lbl>, q23513 <dbl+lbl>,
## # q23514 <dbl+lbl>, q23515 <dbl+lbl>, q23516 <dbl+lbl>,
## # q23517 <dbl+lbl>, q23518 <dbl+lbl>, q23601 <dbl+lbl>,
## # q23602 <dbl+lbl>, q23603 <dbl+lbl>, q23604 <dbl+lbl>,
## # q23605 <dbl+lbl>, q23606 <dbl+lbl>, q23607 <dbl+lbl>,
## # q23608 <dbl+lbl>, q23609 <dbl+lbl>, q23610 <dbl+lbl>,
## # q23611 <dbl+lbl>, q23612 <dbl+lbl>, q23613 <dbl+lbl>,
## # q23614 <dbl+lbl>, q237 <dbl+lbl>, q238 <dbl+lbl>,
## # q23901 <dbl+lbl>, q23902 <dbl+lbl>, q23903 <dbl+lbl>,
## # q23904 <dbl+lbl>, q23909 <dbl+lbl>, q240 <dbl+lbl>, …
read
will not work with all resources in RIDL, so far the following
format are supported: csv
, xlsx
, xls
, dta
(Stata
).
I will consider adding more data types in the future, feel free to file an issue if it doesn’t work as expected or you want to add a support for a new format.
For Excel files (xlsx
and xls
), you can also use get_sheets
to
list available sheets and use the sheet
paramater in read
to specify
the sheet you want to read (default is to read the first sheet).
We can also use ridl_dataset_show
to directly read and access a
dataset object.
dataset_name <- "official-cross-border-figures-of-venezuelan-individuals"
rd_show(dataset_name) |>
rd_resource_get_all() |>
nth(1) |>
rr_read()
## + Reading sheet: VEN_Official Borders Figures
## # A tibble: 1,314 x 5
## Country `Mov Type` `Border Point` Month_Year Total_individua…
## <chr> <chr> <chr> <chr> <dbl>
## 1 Ecuador Entry from… Aeropuerto In… January-20 0
## 2 Ecuador Entry from… Aeropuerto In… February-… 1
## 3 Ecuador Entry from… Aeropuerto In… March-20 0
## 4 Ecuador Entry from… Aeropuerto In… April-20 0
## 5 Ecuador Entry from… Aeropuerto In… May-20 0
## 6 Ecuador Entry from… Aeropuerto In… June-20 2
## 7 Ecuador Entry from… Aeropuerto In… July-20 2
## 8 Ecuador Entry from… Aeropuerto In… August-20 2
## 9 Ecuador Entry from… Aeropuerto In… September… NA
## 10 Ecuador Entry from… Aeropuerto In… January-20 0
# … with 1,304 more rows
If you know the id of a RIDL Resource
object you can also use directly
ridl_resource_show
to access directly the desired resource.
rd_show(dataset_name) |>
rd_resource_get_all() |>
nth(1)
## + <RIDL Resource> 68e39d44-88ae-49f9-b492-3635341c92be
## Name: VEN_OfficialFiguresBorders
## Description: Compilation of official figures on Venezuelan population per month per entry-exit point.
## Type: microdata
## Size: 39998
## Format: XLSX
ridl_resource_show("68e39d44-88ae-49f9-b492-3635341c92be") |>
ridl_resource_read()
## + Reading sheet: VEN_Official Borders Figures
## # A tibble: 1,314 x 5
## Country `Mov Type` `Border Point` Month_Year Total_individua…
## <chr> <chr> <chr> <chr> <dbl>
## 1 Ecuador Entry from… Aeropuerto In… January-20 0
## 2 Ecuador Entry from… Aeropuerto In… February-… 1
## 3 Ecuador Entry from… Aeropuerto In… March-20 0
## 4 Ecuador Entry from… Aeropuerto In… April-20 0
## 5 Ecuador Entry from… Aeropuerto In… May-20 0
## 6 Ecuador Entry from… Aeropuerto In… June-20 2
## 7 Ecuador Entry from… Aeropuerto In… July-20 2
## 8 Ecuador Entry from… Aeropuerto In… August-20 2
## 9 Ecuador Entry from… Aeropuerto In… September… NA
## 10 Ecuador Entry from… Aeropuerto In… January-20 0
# … with 1,304 more rows
ct <- ridl_container_list(sort = "package_count")
head(ct)
## [1] "ethiopia-sens" "data-deposit" "kenya-sens"
## [4] "afghanistan" "bangladesh-sens" "south-sudan-sens"
grep("niger-", ct, ignore.case = TRUE, value = TRUE)
## [1] "niger-protection" "niger-sens"
ridl_container_show("niger-protection")
## <RIDL Container> d341942e-547e-404b-bcdf-c72b2cd85530
## Name: niger-protection
## Display name: Niger: Protection
## No. Datasets: 5
## No. Members: 3
ridl_container_show("niger-protection") |>
ridl_dataset_list()
## [1] "enrolement-pdi-tillaberi-tillaberi-niger-2020"
## [2] "identify-asylum-seekers-in-migration-flow-agadez-niger-2018-2019-2020"
## [3] "monitoring-the-migration-flow-1-agadez-niger-2019-2020"
## [4] "enrolement-pdi-tahoua-aout-2020-tahoua-niger-2020"
## [5] "enrolement-pdi-maradi-maradi-niger-2020"
It’s possible to create a RIDLDataset
object we can manipulate and
upload to the RIDL platform.
ridl_dataset(name = "test-dataset-pen",
title = "Test Dataset PEN",
notes = "Some description",
owner_org = "africa",
data_collector = "unhcr",
keywords = list(3, 4),
unit_of_measurement = "kg",
data_collection_technique = "f2f",
archived = FALSE,
visibility = "restricted",
external_access_level = "data_enclave")
## <RIDL Dataset>
## Title: Test Dataset PEN
## Name: test-dataset-pen
## Visibility: restricted
## Container: Africa
## Resources (up to 5):
ds <- ridl_dataset(name = "test-dataset",
title = "Test Dataset",
notes = "An example dataset",
owner_org = "west-africa",
data_collector = "ACF, UNHCR",
keywords = list(3, 4),
unit_of_measurement = "individual",
data_collection_technique = "f2f",
sampling_procedure = "nonprobability",
operational_purpose_of_data = "cartography",
archived = "False",
visibility = "restricted",
external_access_level = "open_access")
ds
## <RIDL Dataset>
## Title: Test Dataset
## Name: test-dataset
## Visibility: public
## Resources (up to 5):
ridl_resource
can also be used to create a RIDLResource
.
rs <- ridl_resource(name = "Test resource",
type = "data",
format = "CSV",
file_type = "microdata",
identifiability = "anonymized_public",
date_range_start = "2018-01-01",
date_range_end = "2019-01-01",
process_status = "anonymized",
visibility = "public",
version = 1L)
rs
## <RIDL Resource>
## Name: Test resource
## Description:
## Type: microdata
## Size:
## Format: CSV
We can add the resource to the dataset and upload it to the RIDL platform.
ds |>
ridl_dataset_resource_add(rs)
ds
## <RIDL Dataset>
## Title: Test Dataset
## Name: test-dataset
## Visibility: restricted
## Resources (up to 5): Test resource
- Please report any issues or bugs.
- License: MIT
- Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.