-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathstata_users.Rmd
5336 lines (4212 loc) · 290 KB
/
stata_users.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
cap-location: margin
css: [css/layout.css, css/edit-pagedown.css, css/fonts.css]
output:
pagedown::html_paged:
toc: true
self_contained: false
front_cover: images/cover-stata.png
toc-title: Contents
paged-footnotes: true
lot: false
lof: false
toc: true
---
```{r, eval=TRUE, echo=FALSE}
knitr::opts_chunk$set(
eval = FALSE, # Stata chunks not evaluated; manually change R chunks
comment = NA,
message = FALSE,
R.options = list(width = 88),
fig.align = 'center',
highlight = FALSE
)
source("utils/utilities.r")
```
# Preface {.front-matter .unnumbered}
This guide was commissioned and funded by the Family Planning Team at the Bill & Melinda Gates Foundation. The examples here are directly based on the companion [IPUMS PMA data analysis blog](https://tech.popdata.org/pma-data-hub/), with R examples developed by Matt Gunther and IPUMS PMA documentation by Devon Kristiansen under the direction of Kathryn Grace, PhD and Elizabeth Heger Boyle, PhD at IPUMS PMA, University of Minnesota. The Stata version and statistical consulting were provided by Mia Yu and Dale Rhoda at [Biostat Global Consulting](http://www.biostatglobal.com/). These authors are grateful for helpful reviews & comments from Philip Anglewicz, PhD; Linnea Zimmerman, PhD, and Aisha Siewe at Johns Hopkins University. Thanks also to Caitlin Clary, PhD, Mary Kay Trimner, Nina Brooks, PhD, and Finn Roberts for code contributions and review.
#### Suggested Citation {.unnumbered}
Matt Gunther, Mia Yu, Dale Rhoda, and Devon Kristiansen. *IPUMS PMA Longitudinal Analysis Guide for Stata Users* (November 2022). Minneapolis, MN: IPUMS. [pma.ipums.org](https://pma.ipums.org/pma/)
#### Source Code {.unnumbered}
The code provided in this manual is open source ([© MPL 2.0](https://www.mozilla.org/en-US/MPL/2.0/)). This manual was constructed from [R Markdown](https://rmarkdown.rstudio.com/) files with the `r funlink(pagedown)` package for R.^[`r funlink(pagedown)` © Xie, Yihui et al. (MIT)] These files are available on our [GitHub repository](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide), where you will also find `.r` and `.do` files containing the code shown in this manual.
The IPUMS PMA data files referenced in this manual are also available at no cost, but you must register and adhere to terms of use at [pma.ipums.org/register](https://pma.ipums.org/pma/register.shtml). Dataset access is granted only for non-commercial purposes. Users must register an account with IPUMS, request access to data from particular countries, and describe their intended use for the data. Users who have been approved for access to certain countries may submit justification to expand their access to other countries.
[La version française du formulaire d'inscription](https://pma.ipums.org/pma/formulaire_d_inscription.shtml)
#### Revision History {.unnumbered}
Revisions to this manual are listed by date and accompanied by comments [here](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide/commits/main). **Questions and suggested changes are welcome!** Please submit requests to our [Issues](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide/issues) forum on GitHub.
\newpage
#### Hyperlinks {.unnumbered}
Hyperlinks to IPUMS PMA variable documentation, relevant R and Stata documentation, and various other resources are highlighted [in pink](https://pma.ipums.org) throughout this manual. If the reader prefers a printed version, they are recommended to compile the manual from source files on our GitHub repository, changing the `r funlink(pagedown)` option described [here](https://pagedown.rbind.io/#links). **Warning:** this will add additional footnotes to the document, and may impact pagination.
#### Acronyms {.unnumbered}
- BMGF - [Bill & Melinda Gates Foundation](https://www.gatesfoundation.org/)
- CI - confidence interval
- CMC - century month code
- CONSORT - [Consolidated Standards of Reporting Trials](www.consort-statement.org)
- CRAN - [The Comprehensive R Archive Network](https://cran.r-project.org/) (statistical software)
- CSV - comma-separated values file format
- DEFF - design effect
- DEFT - root design effect (square root of DEFF)
- DRC - Democratic Republic of Congo
- EA - enumeration area
- FP - family planning
- FP2020 - Family Planning 2020
- FP2030 - [Family Planning 2030](https://fp2030.org/)
- GPS - [global positioning system](https://www.gps.gov/)
- IPUMS - [Integrated Public Use Microdata Series](https://www.ipums.org/)
- ISO - International Organization for Standardization
- IUD - intrauterine device
- LAM - lactational amenorrhea method of contraception
- NA - not available (R notation for a missing data element)
- NIU - not in universe
- PMA - [Performance Monitoring for Action](https://www.pmadata.org/)
- PPS - probability proportional to size
- SAS - [statistical analysis system](https://www.sas.com/) (statistical software)
- SPSS - [statistical package for social sciences](https://www.ibm.com/spss) (statistical software)
# Introduction
[Performance Monitoring for Action (PMA)](https://www.pmadata.org/) uses innovative mobile technology to support low-cost, rapid-turnaround surveys that monitor key health and development indicators.
PMA surveys collect longitudinal data throughout a country at the household and health facility levels by female data collectors, known as resident enumerators, using mobile phones. The survey collects information from the same women and households over time for regular tracking of progress and for understanding the drivers of contraceptive use dynamics. The data are rapidly validated, aggregated, and prepared into tables and graphs, making results quickly available to stakeholders. PMA surveys can be integrated into national monitoring and evaluation systems using a low-cost, rapid-turnaround survey platform that can be adapted and used for various health data needs.
The PMA project is implemented by local partner universities and research organizations who train and deploy the cadres of female resident enumerators.
<aside>
PMA has also published a guide to **cross-sectional** analysis in both [English](https://www.pmadata.org/media/1243/download?attachment) and [French](https://www.pmadata.org/media/1244/download?attachment).
</aside>
The purpose of this manual is to provide guidance on the analysis of **harmonized longitudinal data** for a panel of women age 15-49 surveyed by PMA and published in partnership with [IPUMS PMA](https://pma.ipums.org/pma/). IPUMS provides census and survey products from around the world in an integrated format, making it easy to compare data from multiple countries. IPUMS PMA data are available free of charge, subject to terms and conditions: please [register here](https://pma.ipums.org/pma/register.shtml) to request access to the data featured in this guide.^[PMA data for individual countries is also available at no cost from [pmadata.org](https://www.pmadata.org/). Please note that the variable names, value labels, numeric codes, and other metadata featured in this guide have been altered by IPUMS PMA to facilitate comparison across countries.]
This manual provides reproducible coding examples in the statistical software program [Stata](https://www.stata.com/). You can download `.do` files containing all of the code needed to reproduce these examples on our [GitHub page](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide).
**R users:** a companion manual for IPUMS PMA longitudinal analysis is also available with coding examples written in R. Additionally, the [IPUMS PMA data analysis blog](https://tech.popdata.org/pma-data-hub/) includes an online version of each chapter and posts on a range of other topics updated every two weeks.
## IPUMS PMA data in Stata
The first two chapters of this manual introduce new users to [PMA longitudinal data](https://www.pmadata.org/data/survey-methodology) and the [IPUMS PMA website](https://pma.ipums.org/pma/), respectively. After demonstrating how to obtain an IPUMS PMA data extract, the remaining chapters feature extensive data analysis examples written in Stata. The accompanying `.do` files available for download from [our GitHub site](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide) include alternative sets of syntax for users who are running older versions of Stata.
<aside class="hex">
```{r, eval=TRUE, echo=FALSE}
hex("stata")
```
</aside>
To follow along, you'll need to purchase and download the appropriate version of Stata for your computer's operating system at [stata.com](https://www.r-project.org/). Discounted licences are available for students and for faculty and staff at participating institutions: learn more [here](https://www.stata.com/order/). Several of the functions referenced in this manual have been updated for **Stata Version 17**. In later chapters we use several user-written Stata commands that you can install by typing the following commands at the Stata prompt. You only need to do this step once on each computer.
```{stata}
ssc install heatplot, replace
ssc install palettes, replace
ssc install colrspace, replace
net install grc1leg2.pkg
```
#### Setup {.unnumbered}
The guide will be most helpful if you reconstruct the data extracts that are used here and run the syntax in the `.do` files that accompany the text. There are several such files, all available from [our GitHub repository](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide).
The **main** program is named `IPUMS_PMA_Longitudinal_Analysis_Guide_For_Stata_Users.do` and it includes all of the syntax that you see in the guide as well as several bonus blocks of code that demonstrate alternative approaches to some of the tasks described here.
For purposes of brevity in the guide, four repetitive blocks of code have been put into very small `.do` files and, rather than repeat that code over and over in the guide, we use the Stata [include](https://www.stata.com/manuals/pinclude.pdf) syntax to accomplish with one line of code what would otherwise take more. In the manual, you will see code like this:
```{stata}
* Include the snippet of code that generates and labels the pop variable.
include gen_pop.do
```
The four files that are incorporated in that manner are named:
- `gen_pop.do`
- `label_pop_values.do`
- `gen_strata_recode.do`
- `label_chg_fpcurr.do`
The contents of these files are shown below:
#### Save as `gen_pop.do` {.unnumbered}
```{stata}
* Construct a new variable named pop and give it a
* unique value for each PMA population.
gen pop = .
replace pop = 1 if country == 1 // Burkina Faso
replace pop = 2 if country == 2 & geocd == 1 // Kinshasa
replace pop = 3 if country == 2 & geocd == 2 // Kongo Central
replace pop = 4 if country == 7 // Kenya
replace pop = 5 if country == 9 & geong == 4 // Kano
replace pop = 6 if country == 9 & geong == 2 // Lagos
label variable pop "Population"
include label_pop_values.do
```
#### Save as `label_pop_values.do` {.unnumbered}
```{stata}
label define pop ///
1 "Burkina Faso" ///
2 "DRC-Kinshasa" ///
3 "DRC-Kongo Central" ///
4 "Kenya" ///
5 "Nigeria-Kano" ///
6 "Nigeria-Lagos", replace
label values pop pop
```
#### Save as `gen_strata_recode.do` {.unnumbered}
```{stata}
* Make a new variable named strata_recode and set it to strata_1
* everywhere except DRC and set it to geocd in DRC
clonevar strata_recode = strata_1
replace strata_recode = geocd if country == 2
* Copy the value label from strata_1 into a new label named strata_recode
* and update it with the labels from geocd
label copy STRATA_1 strata_recode, replace
label define strata_recode 1 "Kinshasa, DRC" 2 "Kongo Central, DRC", modify
* Use the new value label with the new variable
label values strata_recode strata_recode
```
#### Save as `label_chg_fpcurr.do` {.unnumbered}
```{stata}
label define chg_fpcurr 1 "Changed methods" 2 "Continued method" ///
3 "Continued non-use" 4 "Started using" ///
5 "Stopped using", replace
label values chg_fpcurr chg_fpcurr
label var chg_fpcurr "Phase 1 to 2 Family Planning Change Status"
```
#### Working Directory {.unnumbered}
To run successfully, you must create a folder for use as Stata's [working directory](https://www.stata.com/manuals/dcd.pdf). Save the following files in that folder (all of which are available in our [Stata folder on GitHub](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide/tree/main/stata_users_files)):
- The five `.do` files listed above
- `pma_sankey_template4.dta dataset`
- `sankey_plot_with_legend.ado`
- `sankey_plot_with_legend.sthlp`
The main `.do` file has one [cd](https://www.stata.com/manuals/dcd.pdf) command at the first line of executable code. **You will need to change the path in that line of the program to match your own working directory.**
#### Featured Data Extracts {.unnumbered}
In subsequent chapters, we will include instructions for requesting data extracts from IPUMS PMA that are identical those used in our analysis. These data are available at no cost, but you must register and adhere to terms of use at [pma.ipums.org/register](https://pma.ipums.org/pma/register.shtml).
Each data extract that you request from IPUMS PMA is named with a unique number. For example, your very first extract will be named `pma_00001.dta`. In this guide we reference seven files, but your own file names may vary depending on the number of IPUMS PMA extracts you have requested previously.
- `pma_00001.dta`
- `pma_00002.dta`
- `pma_00003.dta`
- `pma_00004.dta`
- `pma_00005.dta`
- `pma_00006.dta`
- `pma_00007.dta`
Save each data extract in the same working directory containing the Stata files described above.
\newpage
#### Learning More {.unnumbered}
For a general introduction to analysis of IPUMS PMA data in Stata, visit the [IPUMS PMA Support](https://pma.ipums.org/pma/support.shtml) page, where you'll find links to video tutorials and data exercises written for Stata users. Similar resources are available for users of R, SPSS, and SAS.
Resources to learn about Stata include [the company website](https://www.stata.com/), official [FAQs](https://www.stata.com/support/faqs/), official [blog](https://blog.stata.com/), a vibrant [user forum](https://statalist.org/), official [YouTube channel](https://www.youtube.com/channel/UCVk4G4nEtBS4tLOyHqustDA) and the Stata website's own [list of resources]( https://www.stata.com/support/faqs/resources/learning-about-stata/). [Stata Press]( https://www.stata-press.com/) publishes excellent books on how to use Stata. Four that we recommend are:
- [A Gentle Introduction to Stata, Sixth Edition](https://www.stata.com/bookstore/gentle-introduction-to-stata/) by Alan C. Acock, ISBN: 978-1-59718-269-0.
- [The Workflow of Data Analysis Using Stata]( https://www.stata.com/bookstore/workflow-data-analysis-stata/) by J. Scott Long, ISBN: 978-1-59718-047-4.
- [A Visual Guide to Stata Graphics, Fourth Edition]( https://www.stata.com/bookstore/visual-guide-to-stata-graphics/) by Michael N. Mitchell, ISBN: 978-1-59718-365-9.
- [An Introduction to Stata Programming, Second Edition]( https://www.stata.com/bookstore/introduction-stata-programming/) by Christopher F. Baum, ISBN: 978-1-59718-150-1.
The [Stata documentation website](https://www.stata.com/features/documentation/) offers dedicated, detailed, and freely available reference manuals for each of the fields referenced in this document:
- [Survey Data Reference Manual](https://www.stata.com/bookstore/survey-data-reference-manual/) is 220 pages, ISBN: 978-1-59718-350-5.
- [Longitudinal/Panel Data Reference Manual](https://www.stata.com/bookstore/longitudinal-panel-data-reference-manual/) is 632 pages, ISBN: 978-1-59718-354-3.
- [Survival Analysis Reference Manual](https://www.stata.com/bookstore/survival-analysis-reference-manual/) is 538 pages, ISBN: 978-1-59718-349-9.
For help with a particular Stata command, simply type "help *command_name*" at the Stata command prompt. A brief explanation will open in a Stata Viewer window along with a link to a more complete explanation in the extensive PDF documentation. And, of course, a quick Google search will usually turn up helpful answers to Stata-related questions.
One excellent private resource is [The Stata Guide](https://medium.com/the-stata-guide) blog by [Asjad Naqvi](https://asjadnaqvi.github.io/). He shares clear explanations and code to make a variety of very handsome maps and graphics. The posts reside behind a paywall. You may read three of them for free per month, or pay a subscription fee to [medium.com](http://www.medium.com) for unlimited access to his writing (and that of many other authors).
Finally, for survey data analysis, we heartily recommend the text [Applied Survey Data Analysis (2ed)](https://www.routledge.com/Applied-Survey-Data-Analysis/Heeringa-West-Berglund/p/book/9780367736118) by Heeringa, West, and Berglund, ISBN: 978-0-36773-611-8. It is a treasure trove of thoughtful insight and its [companion website](https://websites.umich.edu/~surveymethod/asda/) shares datasets, code examples, and other resources for conducting analyses using a variety of software packages - including Stata and R.
## PMA Background
Dating back to 2013, the original PMA survey design included high-frequency, **cross-sectional** samples of women and service delivery points collected from eleven countries participating in [Family Planning 2020](http://progress.familyplanning2020.org/) (FP2020) - a global partnership that supports the rights of women and girls to decide for themselves whether, when, and how many children they want to have. These surveys were designed to monitor annual progress towards [FP2020 goals](http://progress.familyplanning2020.org/measurement) via population-level estimates for several [core indicators](http://www.track20.org/pages/data_analysis/core_indicators/overview.php).
Beginning in 2019, PMA surveys were redesigned under a renewed partnership called [Family Planning 2030](https://fp2030.org/) (FP2030). These new surveys have been refocused on reproductive and sexual health indicators, and they feature a **longitudinal panel** of women of childbearing age. This design will allow researchers to measure contraceptive dynamics and changes in women’s fertility intentions over a **three year period** via annual in-person interviews.^[In addition to these three in-person surveys, PMA also conducted telephone interviews with panel members focused on emerging issues related to the COVID-19 pandemic in 2020. These telephone surveys are already available for several countries - the IPUMS PMA blog series on [PMA COVID-19 surveys](../../index.html#category:COVID-19) covers this topic in detail.]
Questions on the redesigned survey cover topics like:
* awareness, perception, knowledge, and use of contraceptive methods
* perceived quality and side effects of contraceptive methods among current users
* birth history and fertility intentions
* aspects of health service provision
* domains of empowerment
## Sampling
PMA panel data includes a mixture of **nationally representative** and **sub-nationally representative** samples. The panel study consists of three data collection phases, each spaced one year apart.
As of this writing, IPUMS PMA has released data from the first *two* phases for four countries where Phase 1 data collection began in 2019; IPUMS PMA has released data from only the *first* phase for three countries where Phase 1 data collection began in August or September 2020. Phase 3 data collection and processing is currently underway.
```{r, eval=TRUE, echo=FALSE, results='hide', message=FALSE}
library(kableExtra)
options(knitr.kable.NA = '')
avail <- read_csv("utils/sample_avail.csv", show_col_types = F)
names(avail)[2] <- paste0(
names(avail)[2],
footnote_marker_symbol(1)
)
```
```{r, eval=TRUE, echo=FALSE}
avail %>%
arrange(Sample) %>%
kable(escape = FALSE, format = "html", table.attr = "style='width:100%;'") %>%
kable_styling() %>%
add_header_above(c(" " = 2, "Now Available from IPUMS PMA" = 3)) %>%
scroll_box(
width = "100%",
box_css = paste(
sep = "; ",
"margin-bottom: 1em",
"margin-top: 0em",
"border: 0px solid #ddd",
"padding: 5px"
)
) %>%
footnote(
symbol = "<em>Each data collection phase is spaced one year apart</em>",
escape = FALSE
)
```
<aside>
**Resident enumerators** are women over age 21 living in (or near) each EA who hold at least a high school diploma.
</aside>
PMA uses a multi-stage clustered sample design, with stratification at the urban-rural level or by sub-region. Sample clusters - called [enumeration areas](https://pma.ipums.org/pma-action/variables/EAID#description_section) (EAs) -- are provided by the national statistics agency in each country.^[[Displaced GPS coordinates](https://tech.popdata.org/pma-data-hub/posts/2021-10-15-nutrition-climate/PMA_displacement.pdf) for the centroid of each EA are available for most samples [by request](https://www.pmadata.org/data/request-access-datasets) from PMA. IPUMS PMA provides shapefiles for PMA countries [here](https://pma.ipums.org/pma/gis_boundary_files.shtml).] These EAs are sampled using a *probability proportional to size* (PPS) method relative to the population distribution in each stratum.
\newpage
At Phase 1, 35 household dwellings were selected at random within each EA. Resident enumerators visited each dwelling and invited one household member to complete a [Household Questionnaire](https://pma.ipums.org/pma/resources/questionnaires/hhf/PMA-Household-Questionnaire-English-2019.10.09.pdf)^[Questionnaires administered in each country may vary from this **Core Household Questionnaire** - [click here](https://pma.ipums.org/pma/enum_materials.shtml) for details.] that includes a census of all household members and visitors who stayed there during the night before the interview. Female household members and visitors aged 15-49 were then invited to complete a subsequent Phase 1 [Female Questionnaire](https://pma.ipums.org/pma/resources/questionnaires/hhf/PMA-Female-Questionnaire-English-2019.10.09.pdf).^[Questionnaires administered in each country may vary from this **Core Female Questionnaire** - [click here](https://pma.ipums.org/pma/enum_materials.shtml) for details.]
<aside>
`r stata_link(SAMEDWELLING)` indicates whether a Phase 2 female respondent resided in her Phase 1 dwelling or a new one.
`r stata_link(PANELWOMAN)` indicates whether a Phase 2 household member completed the Phase 1 Female Questionnaire.
</aside>
One year later, resident enumerators visited the same dwellings and administered a Phase 2 Household Questionnaire. A panel member in Phase 2 is any woman still age 15-49 who could be reached for a second Female Questionnaire, either because:
* she still lived there, or
* she had moved elsewhere within the study area,^[The "study area" is area within which resident enumerators should attempt to find panel women that have moved out of their Phase 1 dwelling. This may extend beyond the woman's original EA as determined by in-country administrators - see [PMA Phase 2 and Phase 3 Survey Protocol](https://www.pmadata.org/data/survey-methodology) for details.] but at least one member of the Phase 1 household remained and could help resident enumerators locate her new dwelling.^[In cases where no Phase 1 household members remained in the dwelling at Phase 2, women from the household are considered **lost to follow-up**. Chapter 3 covers this topic in detail.]
Additionally, resident enumerators administered the Phase 2 Female Questionnaire to *new* women in sampled households who:
* reached age 15 after Phase 1
* joined the household after Phase 1
* declined the Female Questionnaire at Phase 1, but agreed to complete it at Phase 2
\newpage
When you select the new **Longitudinal** sample option from IPUMS PMA, you'll be able to include responses from every available phase of the study. These samples are available in either "long" format (responses from each phase will be organized in separate rows) or "wide" format (responses from each phase will be organized in columns).
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/long_radio.png")
```
\newpage
<aside>
`r stata_link(CROSS_SECTION)` indicates whether a household member in a longitudinal sample is also included in the cross-sectional sample for a given year (every person in a cross-sectional sample is included in the longitudinal sample).
</aside>
In addition to following up with women in the panel over time, PMA also adjusted sampling so that a cross-sectional sample could be produced concurrently with each data collection phase. These samples mainly overlap with the data you'll obtain for a particular phase in the longitudinal sample, except that replacement households were drawn from each EA where more than 10% of households from the previous phase were no longer there. Conversely, panel members who were located in a new dwelling at Phase 2 will not be represented in the cross-sectional sample drawn from that EA. These adjustments ensure that population-level indicators may be derived from cross-sectional samples in a given year, even if panel members move or are lost to follow-up.
You'll find PMA cross-sectional samples dating back to 2013 if you select the **Cross-sectional** sample option from IPUMS PMA.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/cross_radio.png")
knitr::opts_chunk$set(echo = TRUE)
```
## Inclusion Criteria for Analysis
Several chapters in this manual feature code you can use to reproduce key indicators included in the **PMA Longitudinal Brief** for each sample. In many cases, you'll find separate reports available in English and French, and for both national and sub-national summaries. For reference, here are the highest-level population summaries available in English for each sample where Phase 2 IPUMS PMA data is currently available:
* [Burkina Faso](https://www.pmadata.org/sites/default/files/data_product_results/Burkina%20National_Phase%202_Panel_Results%20Brief_English_Final.pdf)
* [DRC - Kinshasa](https://www.pmadata.org/sites/default/files/data_product_results/DRC%20Kinshasa_%20Phase%202%20Panel%20Results%20Brief_English_Final.pdf)
* [DRC - Kongo Central](https://www.pmadata.org/sites/default/files/data_product_results/DRC%20Kongo%20Central_%20Phase%202%20Panel%20Results%20Brief_English_Final.pdf)
* [Kenya](https://www.pmadata.org/sites/default/files/data_product_results/Kenya%20National_Phase%202_Panel%20Results%20Brief_Final.pdf)
* [Nigeria - Kano](https://www.pmadata.org/sites/default/files/data_product_results/Nigeria%20KANO_Phase%202_Panel_Results%20Brief_Final.pdf)
* [Nigeria - Lagos](https://www.pmadata.org/sites/default/files/data_product_results/Nigeria%20LAGOS_Phase%202_Panel_Results%20Brief_Final.pdf)
<aside>
We will demonstrate how to request and download an IPUMS PMA data extract in Chapter 2.
</aside>
Panel data in these reports is limited to the *de facto* population of women who completed the Female Questionnaire in both Phase 1 and Phase 2. This includes women who slept in the household during the night before the interview for the Household Questionnaire. The *de jure* population includes women who are usual household members, but who slept elsewhere that night. In order to reproduce the findings from PMA reports, we'll remove *de jure* cases recorded in the variable `r stata_link(RESIDENT)`.
<aside>
Variable names in a "wide" extract have a numeric suffix for their data collection phase. `r stata_link(resident_1)` is the Phase 1 version of `r stata_link(resident)`, while `r stata_link(resident_2)` comes from Phase 2.
</aside>
For example, let's consider a **Wide** format data extract containing Phase 1 and Phase 2 respondents to the Female Questionnaire from Burkina Faso. You'll find the number of women who slept in the household before the Household Questionnaire for each phase reported in `r stata_link(resident_1)` and `r stata_link(resident_2)`:^[Stata's `table` command was revised and extended in Stata Version 17, so some of the examples in this guide require you to have Stata Version 17 or after to run. The updated command yields nicer looking output than the older syntax, but in every case there is a corresponding syntax that will run in older versions of Stata. The .do-files that accompany this guide in [its GitHub repository](https://github.com/IPUMS-Global-Health/IPUMS-PMA-Longitudinal-Guide) check to see what version of Stata the user is running. If they are running v17 or after, it uses the updated `table` command. If they are running an older version, it uses syntax for the older `tabulate` command, which is often abbreviated with the syntax `tab`.]
```{stata}
use pma_00001, clear
keep if sample_1 == 85409
table ( resident_1 ) () (), nototals missing zerocounts
```
```
-----------------------------------------------------------
| Frequency
-----------------------------------------------+-----------
usual member of household |
visitor, slept in hh last night | 106
usual member, did not sleep in hh last night | 174
usual member, slept in hh last night | 6,510
-----------------------------------------------------------
```
This extract includes 174 women who are not members of the *de facto* population because they did not sleep in the sampled household during the night before the Phase 1 interview.
Let's turn to Phase 2:
```{stata}
table ( resident_2 ) () (), nototals missing zerocounts
```
```
-------------------------------------------------------------------------
| Frequency
-------------------------------------------------------------+-----------
usual member of household |
visitor, slept in hh last night | 74
usual member, did not sleep in hh last night | 230
usual member, slept in hh last night | 5,993
slept in hh last night, no response if usually lives in hh | 1
. | 492
-------------------------------------------------------------------------
```
The extract also includes 230 women who are not members of the *de facto* population because they did not sleep in the sampled household during the night before the Phase 2 interview. Moreover, there are 492 missing values (`.`) in `r stata_link(resident_2)` representing women who were **lost to follow-up** after Phase 1. We will explain **loss to follow-up** in detail in Chapter 3.
\newpage
The *de facto* population is represented in both variables by codes 11 and 22. We will use an `if` statement or `keep` statement to include only those cases.
```{stata}
keep if inlist(resident_1,11,22) & inlist(resident_2,11,22)
label variable resident_1 "Resident type - Phase 1"
label variable resident_2 "Resident type - Phase 2"
label define RESIDENT_1 11 "Visitor" 22 "Usual", modify
label define RESIDENT_2 11 "Visitor" 22 "Usual", modify
table ( resident_1 ) ( resident_2 ) (), nototals missing zerocounts
```
```
----------------------------------------------------
| Resident type - Phase 2
| Visitor Usual
------------------------+---------------------------
Resident type - Phase 1 |
Visitor | 56 39
Usual | 17 5,855
----------------------------------------------------
```
\newpage
Additionally, PMA reports only include women who completed (or partially completed) both Female Questionnaires. This information is reported in `r stata_link(resultfq)`. In our **Wide** extract, this information appears in `r stata_link(resultfq_1)` and `r stata_link(resultfq_2)`: if you select the **Female Respondents** option at checkout, only women who completed (or partially completed) the Phase 1 Female Questionnaire will be included in your extract.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/cases1.png")
```
\newpage
We'll further restrict our sample by selecting only cases where `r stata_link(resultfq_2)` shows that the woman also completed the Phase 2 questionnaire. Notice that, in addition to each of the values 1 through 10, there are several **non-response codes** numbered 90 through 99. You'll see similar values repeated across all IPUMS PMA variables, except that they will be left-padded to match the maximum width of a particular variable (e.g. `9999` is used for `r stata_link(INTFQYEAR)`, which represents a 4-digit year for the Female Interview).
```{stata}
use pma_00001, clear
keep if sample_1 == 85409
tab resultfq_2, m
```
```
result of female questionnaire | Freq. Percent Cum.
----------------------------------------+-----------------------------------
completed | 5,491 80.87 80.87
not at home | 78 1.15 82.02
postponed | 22 0.32 82.34
refused | 66 0.97 83.31
partly completed | 12 0.18 83.49
respondent moved | 15 0.22 83.71
incapacitated | 19 0.28 83.99
not interviewed (female questionnaire) | 4 0.06 84.05
not interviewed (household questionnair | 192 2.83 86.88
niu (not in universe) | 399 5.88 92.75
. | 492 7.25 100.00
----------------------------------------+-----------------------------------
Total | 6,790 100.00
```
The numeric codes associated with non-response include:
* `95` Not interviewed (female questionnaire)
* `96` Not interviewed (household questionnaire)
* `97` Don't know
* `98` No response or missing
* `99` NIU (not in universe)
A missing value (`.`) in an IPUMS extract indicates that a particular variable is not provided for a selected sample. In a **Wide** extract, it may also signify that a particular person was not included in the data from a particular phase. Here, a missing value (`.`) appearing in `r stata_link(resultfq_2)` indicates that a Female Respondent from Phase 1 was not found in Phase 2.
\newpage
You can drop incomplete Phase 2 female responses as follows:
```{stata}
use pma_00001, clear
keep if sample_1 == 85409
keep if resultfq_2 == 1
tab resultfq_1 resultfq_2,m
```
```
| result of
| female
| questionna
result of female | ire
questionnaire | completed | Total
----------------------+-----------+----------
completed | 5,487 | 5,487
partly completed | 4 | 4
----------------------+-----------+----------
Total | 5,491 | 5,491
```
Generally, we will combine the filtering steps together in one or two `keep` commands like so:
```{stata}
use pma_00001, clear
keep if sample_1 == 85409
keep if inlist(resident_1,11,22) & inlist(resident_2,11,22) & resultfq_2 == 1
```
In subsequent analyses, we'll use each analytic sample to show how PMA generates key indicators for **contraceptive use status** and **family planning intentions and outcomes**. The summary report for each country includes measures dis-aggregated by demographic variables like:
* `r stata_link(MARSTAT)` - marital status
* `r stata_link(EDUCATT)` and `r stata_link(EDUCATTGEN)` - highest attended level of education^[Levels in `r stata_link(EDUCATT)` may vary by country; `r stata_link(EDUCATTGEN)` recodes country-specific levels in four general categories.]
* `r stata_link(AGE)` - age
* `r stata_link(WEALTHQ)` and `r stata_link(WEALTHT)` - household wealth quintile or tertile^[Households are divided into quintiles/tertiles relative to the distribution of an asset `r stata_link(SCORE, description)` weighted for all sampled households. For sub-nationally-representative samples (DRC and Nigeria), separate wealth distributions are calculated for each sampled region.]
* `r stata_link(URBAN)` and `r stata_link(SUBNATIONAL)` - geographic location^[`r stata_link(SUBNATIONAL)` includes sub-national regions for all sampled countries; country-specific variables are also available on the [household - geography](https://pma.ipums.org/pma-action/variables/group?id=hh_geo) page.]
## Survey Design Elements
Throughout this guide, we'll demonstrate how to incorporate PMA sampling weights and information about its stratified cluster sampling procedure into your analysis. This section describes how to use survey weights, cluster IDs, and sample strata in Stata.
Let's return to the **Wide** data extract described in the previous section, which includes Phase 1 and Phase 2 **Female Respondents** from Burkina Faso. As a reminder: we'll drop women who are non members of the *de facto* population and those who did not complete all or part the Female Questionnaire in both phases.
```{stata}
use pma_00001, clear
keep if sample_1 == 85409
keep if inlist(resident_1,11,22) & inlist(resident_2,11,22) & resultfq_2 == 1
```
Whether you intend to work with a new **Longitudinal** or **Cross-sectional** data extract, you'll find the same set of sampling weights available for all PMA Family Planning surveys dating back to 2013:
<aside>
A fourth Family Planning survey weight, `r stata_link(POPWT, description)`, is currently available only for **Cross-sectional** data extracts.^[`r stata_link(POPWT)` can be used to estimate population-level *counts* - [click here](https://pma.ipums.org/pma/population_weights.shtml) or view [this video](https://www.youtube.com/watch?v=GnCq26t4zgM) for details.]
</aside>
* `r stata_link(HQWEIGHT, description)` can be used to generate cross-sectional population estimates from questions on the Household Questionnaire.^[`r stata_link(HQWEIGHT)` reflects the [calculated selection probability](https://pma.ipums.org/pma/resources/documentation/weighting_memo.pdf) for a household in an EA, normalized at the population-level. Users intending to estimate population-level indicators for *households* should restrict their sample to one person per household via `r stata_link(LINENO, description)` - see [household weighting guide](https://pma.ipums.org/pma/weightguide.shtml#hh) for details.]
* `r stata_link(FQWEIGHT, description)` can be used to to generate cross-sectional population estimates from questions on the Female Questionnaire.^[`r stata_link(FQWEIGHT)` adjusts `r stata_link(HQWEIGHT)` for female non-response within the EA, normalized at the population-level - see [female weighting guide](https://pma.ipums.org/pma/weightguide.shtml#female) for details.]
* `r stata_link(EAWEIGHT, description)` can be used to compare the selection probability of a particular household with that of its EA.
Additionally, PMA created a new weight, `r stata_link(PANELWEIGHT, description)`,
which should be used in longitudinal analyses spanning multiple phases, as it adjusts for loss to follow-up. `r stata_link(PANELWEIGHT, description)` is available only for **Longitudinal** data extracts.
PMA sample clusters are identified by the variable `r stata_link(EAID)`, while sample strata are identified by `r stata_link(STRATA)`. We'll demonstrate how to use each of these survey design elements in Stata below.
### Set survey design
In the following example, we'll show how to use survey design information to estimate the proportion of reproductive age women in Burkina Faso who were using contraception at the time of data collection for both Phase 1 and Phase 2. In a **Cross-sectional** or **Long** format longitudinal extract, you'd find this information in the variable `r stata_link(CP)`. In the **Wide** extract featured here, you'll find it in `r stata_link(cp_1)` for Phase 1, and in `r stata_link(cp_2)` for Phase 2.
```{stata}
table ( cp_1 ) ( cp_2 ) (), nototals missing zerocounts
```
```
--------------------------------------------------------------
| Contraceptive user (Phase 2)
| no yes
-----------------------------+--------------------------------
Contraceptive user (Phase 1) |
no | 2,589 821
yes | 556 1,241
no response or missing | 5 0
--------------------------------------------------------------
```
To estimate a population percentage, we'll need to tell Stata that we are working with a sample survey dataset and specify the IPUMS PMA survey design elements. This is accomplished with the [svyset](https://www.stata.com/manuals/svysvyset.pdf) command.
<aside>
This is a lean `svyset` call. We recall that the default `vce` option is `vce(linearized)` and the default single-unit option is `(missing)`. Read the `svyset` documentation if you want to consider using other settings.
</aside>
We use `r stata_link(eaid_1)` as the cluster ID^[Because women are considered "lost to follow-up" if they moved outside the study area, `r stata_link(eaid_1)` and `r stata_link(eaid_2)` are identical for all panel members: you can use either one to identify sample clusters.] and `r stata_link(strata_1)` as the stratum ID.^[As with `r stata_link(EAID)`, you may use either `r stata_link(STRATA_1)` or `r stata_link(STRATA_2)` if your analysis is restricted to panel members] `r stata_link(panelweight)` represents the survey weight. We also make a binary variable indicating which women were using contraception in both phases.
```{stata}
gen cp_both = cp_1 == 1 & cp_2 == 1 if cp_1 < 90
label variable cp_both "Contraceptive user (Phases 1 & 2)"
label define cp_both 1 "Yes" 0 "No", replace
label values cp_both cp_both
svyset eaid_1, strata(strata_1) weight(panelweight)
```
Now, we can use this survey design information to obtain a population estimate for the proportion of women who used family planning in both phases.
\newpage
```{stata}
svy: proportion cp_both
```
```
(running proportion on estimation sample)
Survey: Proportion estimation
Number of strata = 2 Number of obs = 5,207
Number of PSUs = 167 Population size = 5,215.6413
Design df = 165
--------------------------------------------------------------
| Linearized Logit
| Proportion std. err. [95% conf. interval]
-------------+------------------------------------------------
cp_both |
No | .8122041 .012815 .7855839 .8362084
Yes | .1877959 .012815 .1637916 .2144161
--------------------------------------------------------------
```
This is our first look at Stata’s output for estimating proportions. The top of the output table lists the number of strata and PSUs (enumeration areas) in the dataset, along with the number of respondents in the sample and the sum of their weights (under the heading: Population size). The number of design degrees of freedom (df) is the number of PSUs minus the number of strata.^[Some survey materials guide analysts to only report results for estimates or tests where the relative standard error (100 x standard error of the estimate / the estimate itself) is no greater than 30% or where there are at least twelve degrees of freedom. See the Centers for Disease Control and Prevention's [NHANES CMS tutorial](https://www.cdc.gov/nchs/tutorials/nhanes-cms/variance/variance.htm).︎]
The lower portion of the table lists the values of the outcome variable, or in this case their value labels: No and Yes. It lists the proportion of the population that are estimated to have each outcome, that proportion’s standard error, and a two-sided survey-adjusted confidence interval for the proportion. Stata’s default confidence interval is the so-called "logit interval" which is one of several possibilities.^[For now we will simply say that the default logit interval is a fine choice for most circumstances (see [Dean & Pagano (2015)](https://doi.org/10.1093/jssam/smv024) for discussion). To request a different kind of confidence interval, read about the options and specify what you want using the `citype()` option to the `svy: proportion` command (e.g., `citype(wilson)` or `citype(exact)`). If you estimate a proportion where the sample have either 0% or 100% of respondents with the outcome, then as of the time of this writing, neither Stata nor R's `r funlink(survey)` package will report a confidence interval. Here at Biostat Global Consulting, we have written programs in both Stata and R that yield meaningful confidence intervals for any proportion. Those programs are made freely available as part of software we have written for the World Health Organization. If you want to learn more about them, write to us at [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).]
Describing this output, we might say that "based on this survey sample of 5,207 women from Burkina Faso, we estimate that if the surveys were free from bias then about 18.8% women who were eligible to be sampled in the PMA surveys would be self-reported users of contraception in both Phases 1 and 2 (95% CI: 16.4-21.4%)."
### Design Effect
With survey data collected from using a complex sample design that employs strata and/or clusters, we sometimes like to report the **design effect**, which is an index of the statistical precision penalty that we pay for using that sample design. In Stata, we can see the design effect by issuing the following post-estimation command [estat effects](https://www.stata.com/manuals/svysvypostestimation.pdf).
```{stata}
estat effects
```
```
----------------------------------------------------------
| Linearized
| Proportion std. err. DEFF DEFT
-------------+--------------------------------------------
cp_both |
No | .8122041 .012815 5.6052 2.36753
Yes | .1877959 .012815 5.6052 2.36753
----------------------------------------------------------
```
We see that the design effect `DEFF` is 5.6, which we might interpret by saying "The confidence interval for this estimation is as wide as we would expect from a simple random sample of this sample size (5,207) divided by 5.6 or about 929 respondents."
The `DEFT` is the square root of `DEFF` and we might use it in a sentence thus: "Because of the complex sample design and heterogeneity of survey weights, the confidence interval for this estimation is 2.4 times wider than we would expect from a simple random sample of size 5,207 respondents."
The figure 929 is sometimes called the **effective sample size**.
Let’s take a moment and estimate proportions from two simple random samples where 18.8% of the respondents have the outcome: one where the sample size is 5,207 and one where the sample size is 929. We can do this by generating an empty dataset with the appropriate number of respondents and a binary variable named `y`.
Here we create `y` for the larger, complex sample:
```{stata}
clear
set obs 5207
gen y = 0
replace y = 1 if _n < 0.188 * 5207
```
\newpage
```{stata}
tab y
```
```
y | Freq. Percent Cum.
------------+-----------------------------------
0 | 4,229 81.22 81.22
1 | 978 18.78 100.00
------------+-----------------------------------
Total | 5,207 100.00
```
```{stata}
svyset _n
svy: proportion y
```
```
Survey: Proportion estimation
Number of strata = 1 Number of obs = 5,207
Number of PSUs = 5,207 Population size = 5,207
Design df = 5,206
--------------------------------------------------------------
| Linearized Logit
| Proportion std. err. [95% conf. interval]
-------------+------------------------------------------------
y |
0 | .8121759 .0054131 .8013328 .8225583
1 | .1878241 .0054131 .1774417 .1986672
--------------------------------------------------------------
```
And here we create `y` for the smaller, simple sample:
```{stata}
clear
set obs 929
gen y = 0
replace y = 1 if _n < 0.188 * 929
tab y
```
```
y | Freq. Percent Cum.
------------+-----------------------------------
0 | 755 81.27 81.27
1 | 174 18.73 100.00
------------+-----------------------------------
Total | 929 100.00
```
\newpage
```{stata}
svyset _n
svy: proportion y
```
```
Survey: Proportion estimation
Number of strata = 1 Number of obs = 929
Number of PSUs = 929 Population size = 929
Design df = 928
--------------------------------------------------------------
| Linearized Logit
| Proportion std. err. [95% conf. interval]
-------------+------------------------------------------------
y |
0 | .8127018 .0128073 .786262 .8365509
1 | .1872982 .0128073 .1634491 .213738
--------------------------------------------------------------
```
Now let’s compare the CI width from the simple random sample with N=929 with that from the complex sample with N=5,207. That is: we'll divide the difference between the upper and lower limits of our 95% confidence interval from the complex data by that of the simple random sample. We'll see that it is approximately equal to `DEFT`.
```{stata}
di (.213738 - .1634491) / (.1986672 - .1774417)
```
```
2.3692681
```
It can be disheartening to know that the teams did all the work to interview 5,207 respondents and yet for this estimation that sample only has the statistical precision of a simple random sample of 929 respondents. The statistical penalty is because of both a clustering effect – spatial heterogeneity in the outcome across PSUs – and because of heterogeneity in the survey weights. In some survey reporting contexts you will be expected to report either `DEFF` or `DEFT`, or both. Be clear about which one you are reporting. The design effect will vary across outcomes, across strata, and across PMA Phases, so if it is of interest, estimate it anew for each analysis. You can learn more about the survey design effect in materials on survey sampling statistics, such as the excellent textbook [Applied Survey Data Analysis](https://websites.umich.edu/~surveymethod/asda/).
### Sample strata for DRC
This syntax and `svyset` command worked well for Burkina Faso, but take note: the variable `r stata_link(strata)` is not available for samples collected from DRC - Kinshasa or DRC - Kongo Central. If your extract includes any DRC sample, you’ll need to amend this variable to include a unique numeric code for each of those regions.
For example, let’s look at a different wide extract, containing all of the samples included in this data release. Here, we again include only panel members who completed all or part of the female questionnaire in both phases, and who slept in the household during the night before the interview:
```{stata}
use pma_00002, clear
keep if inlist(resident_1,11,22) & inlist(resident_2,11,22) & resultfq_2 == 1 & ///
cp_1 < 90 & cp_2 < 90
```
Notice that `r stata_link(strata_1)` lists the sample strata for all values of `r stata_link(country)` except for DRC, where the variable is missing.
```{stata}
table ( strata_1 ) if country == 2, nototals missing zerocounts
```
```
-------------------
| Frequency
-------+-----------
strata |
. | 3,478
-------------------
```
We can replace those values with numeric codes from the variable `r stata_link(geocd)`. These codes are distinct from all other values in `r stata_link(strata_1)`.
```{stata}
tab geocd, nolabel
```
```
province, |
congo dr | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,967 56.56 56.56
2 | 1,511 43.44 100.00
------------+-----------------------------------
Total | 3,478 100.00
```
\newpage
Because these codes are distinct from all other values in `r stata_link(strata_1)`, we can create a new variable `strata_recode` that copies `r stata_link(strata_1)` except where `r stata_link(geocd)` is non-missing. In that case, we'll use the numeric code from `r stata_link(geocd)`.
* `strata_recode` - Numeric codes for PMA sample strata (recoded for DRC samples)
```{stata}
clonevar strata_recode = strata_1
replace strata_recode = geocd if country == 2
label copy STRATA_1 strata_recode, replace
label define strata_recode 1 "Kinshasa, DRC" 2 "Kongo Central, DRC", modify
label values strata_recode strata_recode
tab strata_recode, m
```
```
strata | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Kinshasa, DRC | 1,967 11.11 11.11
Kongo Central, DRC | 1,511 8.53 19.64
bungoma - urban, kenya | 153 0.86 20.51
bungoma - rural, kenya | 488 2.76 23.26
kakamega - urban, kenya | 133 0.75 24.02
kakamega - rural, kenya | 438 2.47 26.49
kericho - urban, kenya | 249 1.41 27.90
kericho - rural, kenya | 453 2.56 30.45
kiambu - urban, kenya | 213 1.20 31.66
kiambu - rural, kenya | 311 1.76 33.41
kilifi - urban, kenya | 170 0.96 34.37
kilifi - rural, kenya | 455 2.57 36.94
kitui - urban, kenya | 153 0.86 37.81
kitui - rural, kenya | 585 3.30 41.11
nairobi - urban, kenya | 493 2.78 43.90
nandi - urban, kenya | 260 1.47 45.37
nandi - rural, kenya | 711 4.02 49.38
nyamira - urban, kenya | 143 0.81 50.19
nyamira - rural, kenya | 382 2.16 52.35
siaya - urban, kenya | 130 0.73 53.08
siaya - rural, kenya | 437 2.47 55.55
west pokot - urban, kenya | 104 0.59 56.14
west pokot - rural, kenya | 473 2.67 58.81
lagos, nigeria | 1,088 6.15 64.95
kano - urban | 437 2.47 67.42
kano - rural | 561 3.17 70.59
urban, burkina faso | 3,053 17.24 87.83
rural, burkina faso | 2,154 12.17 100.00
----------------------------------------+-----------------------------------
Total | 17,705 100.00
```
\newpage
Now, we can use `strata_recode` with the `svyset` command to obtain population estimates for each nationally representative or sub-nationally representative sample.
First, we'll create `cp_both` again for this wide dataset.
```{stata}
gen cp_both = cp_1 == 1 & cp_2 == 1 if cp_1 < 90
label variable cp_both "Contraceptive user (Phases 1 & 2)"
label define cp_both 1 "Yes" 0 "No", replace
label values cp_both cp_both
svyset eaid_1, strata(strata_recode) weight(panelweight)
```
For Stata to estimate the proportion for each population, we will use the `over(varname)` option where `varname` needs to be an integer variable - preferably with a value label.
So, we construct a new variable named `pop` and give it a unique value for each PMA population.^[We re-generate this `pop` variable several times through this guide, so we have saved `pop`-related commands above in a code snippet named `gen_pop.do`, which can be called via `include gen_pop.do`. Likewise, we apply this variable label several times in a snippet named `label_pop_values.do`, which can be called via `include label_pop_values.do`.]
```{stata}
gen pop = .
replace pop = 1 if country == 1 // Burkina Faso
replace pop = 2 if country == 2 & geocd == 1 // Kinshasa
replace pop = 3 if country == 2 & geocd == 2 // Kongo Central
replace pop = 4 if country == 7 // Kenya
replace pop = 5 if country == 9 & geong == 4 // Kano
replace pop = 6 if country == 9 & geong == 2 // Lagos
label define pop ///
1 "Burkina Faso" ///
2 "DRC-Kinshasa" ///
3 "DRC-Kongo Central" ///
4 "Kenya" ///
5 "Nigeria-Kano" ///
6 "Nigeria-Lagos", replace
label values pop pop
```
\newpage
Finally, we can use the updated survey design information to estimate the proportion of women who were using contraception at both Phase 1 and Phase 2 in every sample (including those from Kinshasa and Kongo Central).
```{stata}
svy : proportion cp_both , over(pop)
```
```
Survey: Proportion estimation
Number of strata = 28 Number of obs = 17,705
Number of PSUs = 665 Population size = 17,691.26
Design df = 637
------------------------------------------------------------------------
| Linearized Logit
| Proportion std. err. [95% conf. interval]
-----------------------+------------------------------------------------
cp_both@pop |
No Burkina Faso | .8122041 .012815 .785736 .8360846
No DRC-Kinshasa | .6802513 .0163794 .647268 .711525
No DRC-Kongo Central | .7318119 .0287314 .6718062 .7843679
No Kenya | .6342298 .0083126 .6177575 .6503939
No Nigeria-Kano | .9463423 .0130503 .9141428 .9669031
No Nigeria-Lagos | .7065456 .0176703 .6706908 .7400099
Yes Burkina Faso | .1877959 .012815 .1639154 .214264
Yes DRC-Kinshasa | .3197487 .0163794 .288475 .352732
Yes DRC-Kongo Central | .2681881 .0287314 .2156321 .3281938
Yes Kenya | .3657702 .0083126 .3496061 .3822425
Yes Nigeria-Kano | .0536577 .0130503 .0330969 .0858572
Yes Nigeria-Lagos | .2934544 .0176703 .2599901 .3293092
------------------------------------------------------------------------
```
Now that we've identified variables that describe an IPUMS PMA analytic sample, let's proceed by downloading these and other variables of interest in a data extract from IPUMS PMA. In Chapter 2, we'll see that longitudinal data extracts can be requested in either **Long** or **Wide** format, depending on your needs.
# Longitudinal Data Extracts
```{r, eval=TRUE, echo=FALSE}
knitr::opts_chunk$set(out.width = "85%")
```
Chapter 2 provides a guided tour of the [IPUMS PMA data extract system](https://pma.ipums.org/pma/), which you may use to combine survey data collected from multiple countries and multiple phases of the longitudinal study.
IPUMS PMA also makes it easy to switch between multiple [units of analysis](https://pma.ipums.org/pma-action/variables/group) covered in PMA surveys. In addition to the longitudinal data featured in this guide, you'll find surveys representing:
<aside>
A video tour of the longitudinal extract system is available [here](https://www.youtube.com/embed/VwjYHDvpHk0) on the IPUMS PMA Youtube channel.
</aside>
- [Service Delivery Points (SDPs)](https://tech.popdata.org/pma-data-hub/#category:Service_Delivery_Points)
- [Client Exit Interviews conducted at SDPs](https://tech.popdata.org/pma-data-hub/#category:Client_Exit_Interviews)
- Participants in special surveys covering topics like [COVID-19](https://tech.popdata.org/pma-data-hub/#category:COVID-19), [nutrition](https://tech.popdata.org/pma-data-hub/#category:Nutrition), and maternal & newborn health
To get started with a longitudinal data extract, you'll need to select the **Family Planning** topic under the **Person** unit of analysis.
```{r, eval=TRUE, echo=FALSE, out.width="85%"}
knitr::include_graphics("images/unit.png")
```
## Sample Selection
Once you've selected the **Family Planning** option, you'll next need to choose between cross-sectional or longitudinal samples. Cross-sectional samples are selected by default; these are nationally or sub-nationally representative samples collected each year dating backward as far as 2013.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/cross-sectional.png")
```
Longitudinal samples are only available from 2019 onward, and they include all of the available phases for each sampled country (sub-nationally representative samples for DRC and Nigeria are listed separately). You'll only find longitudinal samples for countries where Phase 2 data has been made available; as of this writing, Phase 1 data for Cote d'Ivoire, India, and Uganda can only be found under the Cross-sectional sample menu.
\newpage
Clicking the Longitudinal button reveals options for either **Long** or **Wide** format. You'll find the same samples available in either case.
**Important:** if you decide to change formats after selecting variables, your Data Cart will be emptied and you'll need to begin again from scratch.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/wide.png")
```
\newpage
After you've selected one of the available longitudinal formats, choose one or more samples listed below. There are also several Sample Members options listed.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/cases.png")
```
<aside>
`r stata_link(PANELWOMAN)` indicates whether an individual is a member of the panel study.
`r stata_link(ELIGIBLE)` indicates whether an individual was eligible for the female questionnaire.
</aside>
**Female Respondents** only includes women who completed *all or part* of a Female Questionnaire. **This option selects all members of the panel study.** In addition, it includes women who only participated in only one phase - we will demonstrate how to identify and drop these cases below.^[Women who completed all or part of the Female Questionnaire in *more than one phase* of the study are considered **panel members**. Women who completed it only at Phase 1 are included in a longitudinal extract, but they are not **panel members**. Likewise, women who completed it for the first time at Phase 2 are included, but are not **panel members** if they 1) will reach age 50 before Phase 3, or 2) declined the invitation to participate again in Phase 3.]
**Female Respondents and Female Non-respondents** includes all women who were eligible to participate in a Female Questionnaire. Eligible women are those age 15-49 who were listed on the roster collected in a Household Questionnaire. If an eligible woman declined the Female Questionnaire or was not available, variables associated with that questionnaire will be coded "Not interviewed (female questionnaire)".
\newpage
<aside>
`r stata_link(RESULTFQ)` indicates whether an individual completed the Female Questionnaire.
`r stata_link(RESULTHQ)` indicates whether a member of the individual's household completed the Household Questionnaire.
</aside>
**Female Respondents and Household Members** adds records for all other members of a Female Respondent's household. These household members did not complete the Female Questionnaire, but were listed on the household roster provided by the respondent to a Household Questionnaire. Basic [demographic](https://internal.pma.ipums.org/pma-action/variables/group?id=hh_roster) variables are available for each household member, as are common [wealth](https://internal.pma.ipums.org/pma-action/variables/group?id=hh_wealth), [water](https://internal.pma.ipums.org/pma-action/variables/group?id=water_watersource), [sanitation](https://internal.pma.ipums.org/pma-action/variables/group?id=water_wash), and other variables shared for all members of the same household.
**All Cases** includes all members listed on the household roster from a Household Questionnaire. If the Household Questionnaire was declined or if no respondent was available, any panel member appearing in other phases of the study will be coded "Not interviewed (household questionnaire)" for variables associated with the missing Household Questionnaire.
After you've selected samples and sample members for your extract, click the "Submit Sample Selections" button to return to the main data browsing menu.
## Variable Selection
You can browse IPUMS PMA variables by topic or alphabetically by name, or you can [search](https://pma.ipums.org/pma-action/variables/search) for a particular term in a variable name, label, value labels, or description.
```{r, eval=TRUE, echo=FALSE}
knitr::include_graphics("images/topics.png")
```