-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path12_chapter12_reproducible_scripts.Rmd
2296 lines (1889 loc) · 96.8 KB
/
12_chapter12_reproducible_scripts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output: html_document
---
# Research projects and scripts {#chapter12}
<br>
<center>
![](./images/script_logo.JPG){width=25%}
</center>
<br>
## Rationale
Documenting, cataloguing and disseminating **data** has the potential to increase the volume and diversity of data analysis. There is also much value in documenting, cataloguing and disseminating **data processing and analysis scripts**. Technological solutions such as GitHub, [Jupyter Notebooks or Jupiter Lab](https://jupyter.org/) facilitate the preservation and sharing of code, and enable collaborative work around data analysis. Coding style guides like the [Google style guides](https://google.github.io/styleguide/) and the [Guide to Reproducible Code in Ecology and Evolution](https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf) by the British Ecological Society, contribute to foster the usability, adaptability, and reproducibility of code. But these tools and guidelines do not fully address the issue of cataloguing and discoverability of the data processing and analysis programs and scripts. We propose --as a complement to collaboration tools and style guides-- a metadata schema to document data analysis projects and scripts. The production of structured metadata will contribute not only to discoverability, but also to the reproducibility, replicability, and auditability of data analytics.
There are multiple reasons to make reproducibility, replicability, and auditability of data analytics a component of a data dissemination system. This will:
- Improve the **quality of research and analysis**. Public scrutiny enables contestability and independent quality control of the output of research and analysis; these are strong incentives for additional rigor in data analysis.
- Allow the **re-purposing or expansion of analysis** by the research community, thereby increasing the relevance, utility and value of both the data and of the analytical work.
- Strengthen the **reputation and credibility** of the analysis.
- Provide students and peers with rich **training materials**.
- In some cases, satisfy a **requirement** imposed by peer reviewed journals or financial sponsors of research activities. For example, the [Data and Policy Code of the American Economic Association](https://www.aeaweb.org/journals/policies/data-code) (accessed on June 29, 2020), states that *It is the policy of the American Economic Association to publish papers only if the data and code used in the analysis are clearly and precisely documented, and access to the data and code is clearly and precisely documented and is non-exclusive to the authors. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide, prior to acceptance, information about the data, programs, and other details of the computations sufficient to permit replication, as well as information about access to data and programs.*
- Contribute to **assuring the fairness of policy advice and interventions** resulting from data analysis. Data analysis may be used to identify or target the beneficiaries of policies and programs, or may contribute otherwise to the design and implementation of development policies and projects. By doing so, they also contribute to identifying populations to be excluded from these interventions. Errors and biases may be introduced in analysis by accidental or intentional human errors, by the algorithms themselves, or they can result from flaws in the data. The analysis that informs such projects and policies should therefore be made auditable and contestable, i.e. documented and published.
## Motivation for open analytics
[Stodden et al (2013)](http://stodden.net/icerm_report.pdf) make a useful distinction between five levels of research openness:
1. **Reviewable research**. The descriptions of the research methods can be independently assessed, and the results judged credible. This includes both traditional peer review and community review and does not imply reproducibility.
2. **Replicable research**. Tools are made available that would allow one to duplicate the results of the research, for example by running the authors' code to produce the plots shown in the publication. (Here tools might be limited in scope, e.g., only essential data or executables, and might only be made available to referees or only upon request.)
3. **Confirmable research**. The main conclusions of the research can be attained independently without the use of software provided by the author. (But using the complete description of algorithms and methodology provided in the publication and any supplementary materials.)
4. **Auditable research**. Sufficient records (including data and software) have been archived so that the research can be defended later if necessary or differences between independent confirmations resolved. The archive might be private.
5. **Open or Reproducible research**. This is auditable research made openly available. This comprised well-documented and fully open code and data that are publicly available that would allow one to (a) fully audit the computational procedure, (b) replicate and also independently reproduce the results of the research, and (c) extend the results or apply the method to new problems.
## Goal: discoverable code
Search and filter by title, author, software, method, country, etc. Get links to analytical output and data. Example: search for a "project that implemented multiple imputation in R for a project related to poverty in Kenya": search for *poverty AND "multiple imputation"* and filter the results by software / country.
Note: the code will also be "attached" to the output page (paper) and to the dataset page of the catalog if they are available in the catalog.
<br>
<center>
![image](https://user-images.githubusercontent.com/35276300/229812919-8a457692-310a-4095-80c2-e3bf202ecf21.png)
</center>
<br>
Provide access to scripts with detailed information, including software and libraries used, distribution license, IT requirements, datasets used, list of outputs, and more.
<br>
<center>
![image](https://user-images.githubusercontent.com/35276300/229813050-6ab8d762-7e09-40ca-83b3-877f64caa9e4.png)
</center>
<br>
## Schema description
To make data processing and analysis scripts more discoverable and usable, we propose a metadata schema inspired by the schemas available to document datasets. The proposed schema contains two main blocks of metadata elements: the *document description* intended to document the metadata themselves (the term *document* refers to the file that will contain the metadata), and the *project description* used to document the research or analytical work and the related scripts. We also include in the schema the `tags`, `provenance`, and `additional` elements common to all schemas.
<br>
```json
{
"repositoryid": "string",
"published": 0,
"overwrite": "no",
"doc_desc": {},
"project_desc": {},
"provenance": [],
"tags": [],
"lda_topics": [],
"embeddings": [],
"additional": { }
}
```
<br>
### Document description
**`doc_desc`** *[Optional ; Not repeatable]* <br>
The document description is a description of the metadata file being generated. It provides metadata about the metadata. This block is optional. It is used to document the research project metadata (not the project itself). This information is not needed to document the project; it only provides information, useful for archiving purposes, on the process of generating the project metadata. The information it contains are typically useful to a catalog administrator; they are not useful to the public and do not need to be displayed in the publicly-available catalog interface. This block is optional. It is recommended to enter at least the identification of the metadata producer, her/his affiliation, and the date the metadata were created. One reason for this is that metadata can be shared and harvested across catalogs/organizations, so the metadata produced by one organization can be found in other data centers (complying with standards and schema is precisely intended to facilitate inter-operability of catalogs and automated information sharing). Keeping track of who documented a resource is thus useful.
<br>
```json
"doc_desc": {
"title": "string",
"idno": "string",
"producers": [
{
"name": "string",
"abbr": "string",
"affiliation": "string",
"role": "string"
}
],
"prod_date": "string",
"version": "string"
}
```
<br>
- **`title`** *[Optional ; Not Repeatable ; String]* <br>
The title of the project. This will usually be the same as the element `title` in the project description section.
- **`idno`** *[Optional ; Not Repeatable ; String]* <br>
A unique identifier for the metadata document.
- **`producers`** *[Optional ; Not Repeatable]* <br>
A list of producers <u>of the metadata</u> (who may be but do not have to be the authors of the research project and scripts being documented). These can be persons or organizations. The following four elements are used to identify them and specify their specific role as and if relevant (this block of four elements is repeated for each contributor to the metadata):
- **`name`** *[Optional ; Not Repeatable ; String]* <br>
Name of the person or organization who documented the project.
- **`abbr`**: *[Optional ; Not Repeatable ; String]* <br>
The abbreviation of the organization that is referenced under 'name' above.
- **`affiliation`** *[Optional ; Not Repeatable ; String]* <br>
Affiliation of the person(s) or organization(s) who documented the project.
- **`role`** *[Optional ; Not Repeatable ; String]* <br>
This attribute is used to distinguish different stages of involvement in the metadata production process. <br><br>
- **`prod_date`** *[Optional ; Not Repeatable ; String]* <br>
The date the metadata on this project was produced (not distributed or archived), preferably in ISO 8601 format (YYYY-MM-DD or YYY-MM).
- **`version`** *[Optional ; Not Repeatable ; String]* <br>
Documenting a research project is not a trivial exercise. It may happen that, having identified errors or omissions in the metadata or having received suggestions for improvement, a new version of the metadata is produced. This element is used to identify and describe the current version of the metadata. It is good practice to provide a version number, and information on what distinguishes this version from the previous one(s) if relevant.
<br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
doc_desc = list(
idno = "META_RP_001",
producers = list(
list(name = "John Doe",
affiliation = "National Data Center of Popstan")
),
prod_date = "2020-12-27",
version = "Version 1.0 - Original version of the documentation provided by the author of the project"
),
# ...
)
```
### Project description
**`project_desc`** *[Required ; Not repeatable]* <br>
The project description contains the metadata related to the project itself. All efforts should be made to provide as much and as detailed information as possible.
<br>
```json
"project_desc": {
"title_statement": {},
"abstract": "string",
"review_board": "string",
"output": [],
"approval_process": [],
"project_website": [],
"language": [],
"production_date": "string",
"version_statement": {},
"errata": [],
"process": [],
"authoring_entity": [],
"contributors": [],
"sponsors": [],
"curators": [],
"reviews_comments": [],
"acknowledgments": [],
"acknowledgment_statement": "string",
"disclaimer": "string",
"confidentiality": "string",
"citation_requirement": "string",
"related_projects": [],
"geographic_units": [],
"keywords": [],
"themes": [],
"topics": [],
"disciplines": [],
"repository_uri": [],
"license": [],
"copyright": "string",
"technology_environment": "string",
"technology_requirements": "string",
"reproduction_instructions": "string",
"methods": [],
"software": [],
"scripts": [],
"data_statement": "string",
"datasets": [],
"contacts": []
}
```
<br>
- **`title_statement`** *[Required ; Non repeatable]* <br>
The *title_statement* is a group of five elements, two of them mandatory.
<br>
```json
"title_statement": {
"idno": "string",
"identifiers": [
{
"type": "string",
"identifier": "string"
}
],
"title": "string",
"sub_title": "string",
"alternate_title": "string",
"translated_title": "string"
}
```
<br>
- **`idno`** *[Required ; Not Repeatable ; String]* <br>
A unique identifier to the project. Define and use a consistent scheme to use. Avoid including spaces in the ID. The ID number of a research project is a unique number that is used to identify a particular project. This ID number is a vital reference. A research project can be the formal cause of a survey, scripts, tables and knowledge products. Do not include spaces in the idno element. Use a system that guarantees uniqueness of the ID (DOI, own reference number).
- **`identifiers`** *[Optional ; Repeatable]* <br>
This repeatable element is used to enter identifiers (IDs) other than the `idno` entered in the `title_statement`. It can for example be a Digital Object Identifier (DOI). Note that the identifier entered in `idno` can (and in some cases should) be repeated here. The element `idno` does not provide a `type` parameter; repeating it in this section makes it possible to add that information.
- **`type`** *[Optional ; Not repeatable ; String]* <br>
The type of unique ID, e.g. "DOI".
- **`identifier`** *[Required ; Not repeatable ; String]* <br>
The identifier itself. <br><br>
- **`title`** *[Required ; Not Repeatable ; String]* <br>
The title is the official name of the project as it may be stated in reports, papers or other documents. The title will in most cases be identical to the Document Title (see above). The title may correspond to the title of an academic paper, of a project impact evaluation, etc. Pay attention to capitalization in the title.
- **`sub_title`** *[Optional ; Not Repeatable ; String]* <br>
Subtitle is optional and rarely used. A short subtitle for the project. Often the sub title is used to qualify the title or rephrase the title.
- **`alternate_title`** *[Optional ; Not Repeatable ; String]* <br>
An alternate title of the project. This would be any alternate title that would help discover the research project. In countries with more than one official language, a translation of the title may be provided. Likewise, the translated title may simply be a translation into English from a country's own language.
- **`translated_title`** *[Optional ; Not Repeatable ; String]* <br>
A translated version of the title (this will be used for example when a catalog documents all entries in English, but wants to preserve the title of a project in its original language when the original language is not English).<br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
title_statement = list(
idno = "RR_WB_2020_001",
identifiers = list(
list(type = "DOI", identifier = "XXX-XXX-XXXX")
),
date = "2020",
title = "Predicting Food Crises - Econometric Model"
),
# ...
),
# ...
)
```
<br>
- **`abstract`** *[Optional ; Non repeatable ; String]* <br>
The abstract should provide a clear summary of the purposes, objectives and content of the project. An abstract can make reference to the various outputs associated with the research project.
Example extracted from https://microdata.worldbank.org/index.php/catalog/4218:
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
abstract = "Food price inflation is an important metric to inform economic policy but traditional sources of consumer prices are often produced with delay during crises and only at an aggregate level. This may poorly reflect the actual price trends in rural or poverty-stricken areas, where large populations reside in fragile situations.
This data set includes food price estimates and is intended to help gain insight in price developments beyond what can be formally measured by traditional methods. The estimates are generated using a machine-learning approach that imputes ongoing subnational price surveys, often with accuracy similar to direct measurement of prices. The data set provides new opportunities to investigate local price dynamics in areas where populations are sensitive to localized price shocks and where traditional data are not available.",
# ...
),
# ...
)
```
<br>
- **`review_board`** *[Optional ; Non repeatable ; String]* <br>
Information on whether and when the project was submitted, reviewed, and approved by an institutional review board (or independent ethics committee, ethical review board (ERB), research ethics board, or equivalent).
<br>
- **`output`** *[Optional ; Repeatable]* <br>
This element will describe and reference all substantial/intended products of the research project, which may include publications, reports, websites, datasets, interactive applications, presentations, visualizations, and others. An output may also be referred to as a "deliverable".
<br>
```json
"output": [
{
"type": "string",
"title": "string",
"authors": "string",
"description": "string",
"abstract": "string",
"uri": "string",
"doi": "string"
}
]
```
<br>
The `output` is a repeatable block of seven elements, used to document all output of the research project:
- **`type`** *[Optional ; Non repeatable]* <br>
Type of output. The type of output relates to the media which is used to convey or communicate the intended results, findings or conclusions of the research project. This field may be controlled by a controlled vocabulary. The kind on content could be "Working paper", "Database", etc.
- **`title`** *[Required ; Non repeatable]* <br>
Formal title of the output. Depending upon the kind of output, the title will vary in formality.
- **`authors`** *[Optional ; Non repeatable]* <br>
Authors of the output; if multiple, they will be listed in one same text field.
- **`description`** *[Optional ; Non repeatable]* <br>
Brief description of the output (NOT an abstract)
- **`abstract`** *[Optional ; Non repeatable]* <br>
If the output consists of a document, the abstract will be entered here.
- **`uri`** *[Optional ; Non repeatable]* <br>
A link where the output or information on the output can be found.
- **`doi`** *[Optional ; Non repeatable]*v
Digital Object Identifier (DOI) of the output, if available. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
output = list(
list(type = "working paper",
title = "Estimating Food Price Inflation from Partial Surveys",
authors = "Andrée, B. P. J.",
description = "World Bank Policy Research Working Paper",
abstract = "The traditional consumer price index is often produced at an aggregate level, using data from few, highly urbanized, areas. As such, it poorly describes price trends in rural or poverty-stricken areas, where large populations may reside in fragile situations. Traditional price data collection also follows a deliberate sampling and measurement process that is not well suited for monitoring during crisis situations, when price stability may deteriorate rapidly. To gain real-time insights beyond what can be formally measured by traditional methods, this paper develops a machine-learning approach for imputation of ongoing subnational price surveys. The aim is to monitor inflation at the market level, relying only on incomplete and intermittent survey data. The capabilities are highlighted using World Food Programme surveys in 25 fragile and conflict-affected countries where real-time monthly food price data are not publicly available from official sources. The results are made available as a data set that covers more than 1200 markets and 43 food types. The local statistics provide a new granular view on important inflation events, including the World Food Price Crisis of 2007–08 and the surge in global inflation following the 2020 pandemic. The paper finds that imputations often achieve accuracy similar to direct measurement of prices. The estimates may provide new opportunities to investigate local price dynamics in markets where prices are sensitive to localized shocks and traditional data are not available.",
uri = "http://hdl.handle.net/10986/36778"),
list(type = "dataset",
title = "Monthly food price estimates",
authors = "Andrée, B. P. J.",
description = "A dataset of derived data, published as open data",
abstract = "Food price inflation is an important metric to inform economic policy but traditional sources of consumer prices are often produced with delay during crises and only at an aggregate level. This may poorly reflect the actual price trends in rural or poverty-stricken areas, where large populations reside in fragile situations.
This data set includes food price estimates and is intended to help gain insight in price developments beyond what can be formally measured by traditional methods. The estimates are generated using a machine-learning approach that imputes ongoing subnational price surveys, often with accuracy similar to direct measurement of prices. The data set provides new opportunities to investigate local price dynamics in areas where populations are sensitive to localized price shocks and where traditional data are not available."
uri = "https://microdata.worldbank.org/index.php/catalog/4218"),
doi = "https://doi.org/10.48529/2ZH0-JF55")
),
# ...
)
```
<br>
- **`approval_process`** *[Optional ; Repeatable]* <br>
The *`approval_process`* is a group of six elements used to describe the formal approval process(es) (if any) that the project had to go through. This may for example include an approval by an Ethics Board to collect new data, followed by an internal review process to endorse the results.
<br>
```json
"approval_process": [
{
"approval_phase": "string",
"approval_authority": "string",
"submission_date": "string",
"reviewer": "string",
"review_status": "string",
"approval_date": "string"
}
]
```
<br>
- **`approval_phase`** *[Optional ; Non repeatable]* <br>
A label that describes the approval phase.
- **`approval_authority`** *[Optional ; Non repeatable]* <br>
Identification of the person(s) or organization(s) whose approval was required or sought.
- **`submission_date`** *[Optional ; Non repeatable]* <br>
The date, entered in ISO 8601 format (YYYY-MM-DD), when the project (or a component of it) was submitted for approval.
- **`reviewer`** *[Optional ; Non repeatable]* <br>
Identification of the reviewer(s).
- **`review_status`** *[Optional ; Non repeatable]* <br>
Status of approval.
- **`approval_date`** *[Optional ; Non repeatable]* <br>
Date the approval was formally received, preferably entered in ISO 8601 format (YYYY-MM-DD). <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
approval_process = list(
list(approval_phase = "Authorization to conduct the survey",
approval_authority = "Internal Ethics Board, [Organization]",
submission_date = "2019-01-15",
review_status = "Approved (permission No ABC123)",
approval_date = "2020-04-30"),
list(approval_phase = "Review of research output and authorization to publish",
approval_authority = "Internal Ethics Board, [Organization]",
submission_date = "2021-07-15",
review_status = "Approved",
approval_date = "2021-10-30")
),
# ...
)
# ...
)
```
<br>
- **`project_website`** *[Optional ; Repeatable ; String]* <br>
URL of the project website.
<br>
```json
"project_website": [
"string"
]
```
<br>
- **`language`** *[Optional ; Repeatable]* <br>
A block of two elements describing the language(s) of the project. At least one of the two elements must be provided for each listed language. The use of [ISO 639-2](https://www.loc.gov/standards/iso639-2/php/code_list.php) (the alpha-3 code in Codes for the representation of names of languages) is recommended.
<br>
```json
"language": [
{
"name": "string",
"code": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
The name of the language.
- **`code`** *[Optional ; Not repeatable ; String]* <br>
The code of the language. Numeric codes must be entered as strings. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
languages = list(
list(name = "English", code = "EN"),
list(name = "French", code = "FR")
),
# ...
)
# ...
)
```
<br>
- **`production_date`** <br>
The date in ISO 8601 format (YYYY-MM-DD) the project was completed (this refers to the version that is being documented and released.)
<br>
- **`version_statement`** *[Optional ; Repeatable]* <br>
This repeatable block of four elements is used to list and describe the successive versions of the project.
<br>
```json
"version_statement": {
"version": "string",
"version_date": "string",
"version_resp": "string",
"version_notes": "string"
}
```
<br>
- **`version`** *[Optional ; Not repeatable ; String]* <br>
A label describing the version. For example, "Version 1.2" *[String]*
- **`version_date`** *[Optional ; Not repeatable ; String]* <br>
Date (in ISO 8601 format, YYYY-MM-DD) the version was released *[String]*
- **`version_resp`** *[Optional ; Not repeatable ; String]* <br>
Person(s) or organization(s) responsible for this version. *[String]*
- **`version_notes`** *[Optional ; Not repeatable ; String]* <br>
Additional information on the version if any; it is good practice to describe what distinguishes this version from the previous one(s). The version must be entered as a string, even when composed only of numbers. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
version_statement = list(
list(version = "v1.0",
version_date = "2021-12-27",
version_resp = "University of Popstan, Department of Economics",
version_notes = "First version approved for open dissemination")
),
# ...
)
```
<br>
- **`errata`** *[Optional ; Repeatable]* <br>
This field is used to list and describe errata.
<br>
```json
"errata": [
{
"date": "string",
"description": "string"
}
]
```
<br>
- **`date`** *[Optional ; Not repeatable ; String]* <br>
Date (in ISO 8601 format, YYYY-MM-DD) the erratum was released.
- **`description`** *[Optional ; Not repeatable ; String]* <br>
Description of the error(s) and measures taken to address it/them. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
errata = list(
list(date = "2021-10-30",
description = "Outliers in the data for Afghanistan resulted in unrealistic model estimates of the food prices for January 2020. In the latest version of the 'model.R' script, outliers are detected and dropped from the input data file. The published dataset has been updated."
)
),
# ...
)
)
```
<br>
- **`process`** *[Optional ; Repeatable]* <br>
This element is used to document the life cycle of the research project, from its design and inception to its conclusion. This can include phases of fundraising, IRB, concept note review, data acquisition, analysis, publishing of a working paper, peer review, publishing in journal, presentation to conferences, publishing, evaluation, reporting to sponsors, etc. It is recommended to provide these steps in a chronological order.
<br>
```json
"process": [
{
"name": "string",
"date_start": "string",
"date_end": "string",
"description": "string"
}
]
```
<br>
- **`name`**: *[Optional ; Not repeatable ; String]* <br>
This is a header for the phase of the process.
- **`date_start`** *[Optional ; Not repeatable ; String]* <br>
Date the phase started (preferably in ISO 8601 format, YYYY-MM-DD)
- **`date_end`** *[Optional ; Not repeatable ; String]* <br>
Date the phase ended (preferably in ISO 8601 format, YYYY-MM-DD)
- **`description`** *[Optional ; Not repeatable ; String]* <br>
A brief description of the phase. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
process = list(
list(name = "Presentation of the concept note at the Review Committee decision meeting",
date_start = "2018-02-23",
date_end = "2018-02-23",
description = "Presentation of the research objectives and method by the primary investigator to the Review Committee, which resulted in the approval of the concept note."
),
list(name = "Fundraising",
date_start = "2018-02-24",
date_end = "2018-02-30",
description = "Discussion with project sponsors, and conclusion of the funding agreement."
),
list(name = "Data acquisition and analytics",
date_start = "2018-03-15",
date_end = "2019-01-30",
description = "Implementation of web scraping, then data analysis"
),
list(name = "Working paper",
date_start = "2019-01-30",
date_end = "2019-02-25",
description = "Production (and copy editing) of the working paper"
),
list(name = "Presentation to conferences",
date_start = "2019-04-12",
date_end = "2019-04-12",
description = "Presentation of the paper by the primary investigator at the ... conference, London"
),
list(name = "Curation and dissemination of data and code",
date_start = "2019-02-25",
date_end = "2019-03-18",
description = "Data and script documentation, and publishing in the National Microdata Library"
)
),
# ...
)
)
```
<br>
- **`authoring_entity`** *[Optional ; Repeatable]* <br>
This section will identify the person(s) and/or organization(s) in charge of the intellectual content of the research project, and specify their respective role.
<br>
```json
"authoring_entity": [
{
"name": "string",
"role": "string",
"affiliation": "string",
"abbreviation": "string",
"email": "string",
"author_id": []
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
Name of the person or organization responsible for the research project.
- **`role`** *[Optional ; Not repeatable ; String]* <br>
Specific role of the person or organization mentioned in `name`.
- **`affiliation`** *[Optional ; Not repeatable ; String]* <br>
Agency or organization affiliation of the author/primary investigator mentioned in `name`.
- **`abbreviation`** *[Optional ; Not repeatable ; String]* <br>
Abbreviation used to identify the agency stated under `affiliation`.
- **`email`** *[Optional ; Not repeatable ; String]* <br>
Depending on the agency policies, a researcher may provide a personal email or an agency email to field inquires related to the project.
- **`author_id`** *[Optional ; Repeatable]* <br>
A block of two elements used to provide unique identifiers of the authors, as provided by different registers of researchers. For example, this can be an ORCID number (ORCID is a non-profit organization supported by a global community of member organizations, including research institutions, publishers, sponsors, professional associations, service providers, and other stakeholders in the research ecosystem.)
- **`type`** *[Optional ; Not repeatable ; String]* <br>
The type of ID; for example, "ORCID".
- **`id`** *[Required ; Not repeatable ; String]* <br>
A unique identification number/code for the authoring entity, entered as a string variable.<br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
authoring_entity = list(
list(name = "",
role = "",
affiliation = "",
email = "",
author_id = list(
list(type = "", id = "ORCID")
)
)
),
# ...
)
)
```
<br>
- **`contributors`** *[Optional ; Repeatable]* This section is provided to record other contributors to the research project and provide recognition for the roles they provided.
<br>
```json
"contributors": [
{
"name": "string",
"role": "string",
"affiliation": "string",
"abbreviation": "string",
"email": "string",
"url": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
Name of the person, corporate body, or agency contributing to the intellectual content of the project (other than the PI). If a person, invert first and last name and use commas.
- **`role`** *[Optional ; Not repeatable ; String]* <br>
Title of the person (if any) responsible for the work's substantive and intellectual content.
- **`affiliation`** *[Optional ; Not repeatable ; String]* <br>
Agency or organization affiliation of the contributor.
- **`abbreviation`** *[Optional ; Not repeatable ; String]* <br>
Abbreviation used to identify the agency stated under `affiliation`.
- **`email`** *[Optional ; Not repeatable ; String]* <br>
Depending on the agency policies, a researcher may provide a personal email or an agency email to field inquires related to the project.
- **`url`** *[Optional ; Not repeatable ; String]* <br>
Thhe URL that provides information on the contributor or its affiliate <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
contributors = list(
list(name = "",
role = "",
affiliation = "",
email = ""
)
),
# ...
)
)
```
<br>
- **`sponsors`** *[Optional ; Repeatable]* <br> The source(s) of funds for production of the work. If different funding agencies sponsored different stages of the production process, use the 'role' attribute to distinguish them.
<br>
```json
"sponsors": [
{
"name": "string",
"abbreviation": "string",
"role": "string",
"grant_no": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
Name of the funding agency/sponsor.
- **`abbreviation`** *[Optional ; Not repeatable ; String]* <br>
Abbreviation of the funding/sponsoring agency.
- **`role`** *[Optional ; Not repeatable ; String]* <br>
Specific role of the funding/sponsoring agency.
- **`grant_no`** *[Optional ; Not repeatable ; String]* <br>
Grant or award number.
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
sponsors = list(
list(name = "ABC Foundation",
abbr = "ABCF",
role = "Purchase of the data",
grant_no = "ABC_001_XYZ"
),
list(name = "National Research Foundation",
abbr = "NRF",
role = "Funding of staff and research assistant costs, and variable costs for participation in conferences",
grant_no = "NRF_G01"
)
),
# ...
)
)
```
<br>
- **`curators`** *[Optional ; Repeatable]* <br>
A list of persons and/or organizations in charge of curating the resources associated with the project.
<br>
```json
"curators": [
{
"name": "string",
"role": "string",
"affiliation": "string",
"abbreviation": "string",
"email": "string",
"url": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
The name of the person or organization.
- **`role`** *[Optional ; Not repeatable ; String]* <br>
The specific role of the person or organization in the curation of the project resources.
- **`affiliation`** *[Optional ; Not repeatable ; String]* <br>
The affiliation of the person or organization.
- **`abbreviation`** *[Optional ; Not repeatable ; String]* <br>
An acronym of the organization, if an organization was entered in `name`.
- **`email`** *[Optional ; Not repeatable ; String]* <br>
The email address of the person or organization. The use of personal email addresses must be avoided.
- **`url`** *[Optional ; Not repeatable ; String]* <br>
A link to the website of the person or organization.
<br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
curators = list(
list(name = "National Data Archive of Popstan",
role = "Documentation, preservation and dissemination of the data and reproducible code",
email = "helpdesk@nda. ...",
url = "popstan_nda,org"
)
),
# ...
)
)
```
<br>
- **`reviews_comments`** *[Optional ; Repeatable]* <br>
Many research projects will be subject to a review process, which may happen at different stages of the project implementation (from design to review of the final output). This block is intended to document the comments received by reviewers during this process. It is a repeatable block of metadata elements, which can be used to document comments with a fine granularity.
<br>
```json
"reviews_comments": [
{
"comment_date": "string",
"comment_by": "string",
"comment_description": "string",
"comment_response": "string"
}
]
```
<br>
- **`comment_date`** *[Optional ; Not repeatable ; String]* <br>
The date the comment was provided, in ISO 8601 format (YYYY-MM-DD or YYYY-MM).
- **`comment_by`** *[Optional ; Not repeatable ; String]* <br>
The name of the person or organization that provided the comment.
- **`comment_description`** *[Optional ; Not repeatable ; String]* <br>
The comment itself, in its original formulation or in a summary version.
- **`comment_response`** *[Optional ; Not repeatable ; String]* <br>
The response provided by teh research team/person to the comment, in its original formulation or in a summary version.
<br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
reviews_comments = list(
list(comment_date = "",
comment_by = "",
comment_description = "",
comment_response = ""
)
),
# ...
)
)
```
<br>
- **`acknowledgments`** *[Optional ; Repeatable]* <br>
This repeatable block of elements is used to provide an itemized list of persons and organizations whose contribution to the project must be acknowledged. Note that specific metadata elements are available for listing financial sponsors and main contributors to the study.<br>
An alternative to this field is the `acknowledgment_statement` field (see below) which can be used to provide the acknowledgment in the form of an unstructured text.
<br>
```json
"acknowledgments": [
{
"name": "string",
"affiliation": "string",
"role": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
The name of the person or agency being recognized for supporting the project.
- **`affiliation`** *[Optional ; Not repeatable ; String]* <br>
The affiliation of the person or agency being acknowledged.
- **`role`** *[Optional ; Not repeatable ; String]* <br>
A brief description of the role of the person or agency that is being recognized or acknowledged for supporting the project.<br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
acknowledgements = list(
list(name = "",
affiliation = "",
role = ""
),
list(name = "",
affiliation = "",
role = ""
)
),
# ...
)
)
```
<br>
- **`acknowledgement_statement`** *[Optional ; Not repeatable ; String]* <br>
This field is used to provide acknowledgments in the form of an unstructured text. An alternative to this field is the *acknowledgments* field which provides a solution to itemize the acknowledgments.
- **`disclaimer`** *[Optional ; Not repeatable ; String]* <br>
Disclaimers limit the responsibility or liability of the publishing organization or researchers associated with the research project. Disclaimers assure that any research in the public domain produced by an organization has limited repercussions to the publishing organization. A disclaimer is intended to prevent liability from any effects occurring as a result of the acts or omissions in the research.
- **`confidentiality`** *[Optional ; Not repeatable ; String]* <br>
A confidentiality statement binds the publisher to ethical considerations regarding the subjects of the research. In most cases, the individual identity of an individual that is the subject of research can not be released and special effort is required to assure the preservation of privacy.
- **`citation_requirement`** *[Optional ; Not repeatable ; String]* <br>
The citation requirement is specific to the output and is a preferred shorthand or means to refer to the publication or published good.
- **`related_projects`** *[Optional ; Repeatable]* <br>
The objective of this block is to provide links (URLs) to other, related projects which can be documented and disseminated in the same catalog or any other location on the internet.
<br>
```json
"related_projects": [
{
"name": "string",
"uri": "string",
"note": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
The name (title) of the related project.
- **`uri`** *[Optional ; Not repeatable ; String]* <br>
A link (URL) to the related project web page.
- **`note`** *[Optional ; Not repeatable ; String]* <br>
A brief description or other relevant information on the related project. <br><br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(
# ... ,
related_projects = list(
list(name = "",
uri = "",
note = "")
),
# ...
)
)
```
<br>
- **`geographic_units`** *[Optional ; Repeatable]* <br>
The geographic areas covered by the project. When the project relates to one or more countries, or part of one or more countries, it is important to provide the country name. This means that for a project related to a specific province or town of a country, the country name will be entered in addition to the province or town (as separate entries in this repeatable block of elements). Note that the area does not have to be an administrative area; it can for example be an ocean.
<br>
```json
"geographic_units": [
{
"name": "string",
"code": "string",
"type": "string"
}
]
```
<br>
- **`name`** *[Optional ; Not repeatable ; String]* <br>
The name of the geographic area.
- **`code`** *[Optional ; Not repeatable ; String]* <br>
The code of the geographic area. For countries, it is recommended to use the [ISO 3166](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) country codes and names.
- **`type`** *[Optional ; Not repeatable ; String]* <br>
The type of geographic area.<br>
```{r, indent=" ", eval=F, echo=T}
my_project = list(
# ... ,
project_desc = list(