A socioeconomic index, also known as a deprivation index, is a single numerical figure that gauges the socioeconomic status of a predefined area. It encompasses multiple socioeconomic characteristics as well as their relative significance. It would allow for direct comparisons of socioeconomic status between regions and would be tremendously useful in identifying patterns and correlations between socioeconomic status and other attributes. This is our attempt at creating a socioeconomic index for Sri Lanka.
The dataset we used was the 2011 national census datasets. This repository contains a cleaned version of these datasets. Below is a thorough description of the datasets and their respective name in the code.
Dataset | Category | Variable | Name in Code |
---|---|---|---|
Household | Cooking Fuel | Firewood Kerosene Gas Electricity Sawdust / Paddy husk Other |
coo_firewood coo_kerosene coo_gas coo_electricity coo_sawdust_paddyhusk coo_other |
Household | Floor Material | Cement Tile / Granite / Terrazzo Mud Wood Sand Concrete Other |
flo_cement flo_tile_granite_terrazzo flo_mud flo_wood flo_sand flo_concrete flo_other |
Household | Housing | Permanent Semi-permanent Improvised Unclassified |
hou_permanent hou_semipermanent hou_improvised hou_unclassified |
Household | Lighting | National Grid Hydro Power Kerosene Solar Power Biogas Other |
lig_nationalgrid lig_hydro lig_kerosene lig_solar lig_biogas lig_other |
Household | Roof Material | Tile Asbestos Concrete Zinc / Aluminium sheet Metal sheet Cadjan / Palmyrah / Straw Other |
roo_tile roo_asbestos roo_concrete roo_zinc_aluminium roo_metal roo_cadjan_palmyrah_straw roo_other |
Household | Structure | Single - 1 story Single - 2 story Single - 3+ story Attached house / Annex Flat Condominium Twin house Row / Line room Hut / Shanty |
str_single_1 str_single_2 str_single_3 str_attachedhouse_annex str_flat str_condominium str_twinhouse str_room str_hut_shanty |
Household | Tenure | Owned Rent / Lease - government owned Rent / Lease - private owned Rent free Encroached Other |
ten_owned ten_rent_gov ten_rent_pvt ten_rent_free ten_encroached ten_other |
Household | Toilet Facilities | Water Seal - connected to sewer Water Seal - connected to septic tank Pour flush Direct pit Other No toilet |
toi_waterseal_sewer toi_waterseal_tank toi_pourflush toi_directpit toi_other toi_none |
Household | Wall Material | Brick Cement block / Stone Cabook Soil brick Mud Cadjan / Palmyrah Plank / Metal sheet Other |
wal_brick wal_cementblock_stone wal_cabook wal_soilbrick wal_mud wal_cadjan_palmyrah wal_plank_metal wal_other |
Household | Waste Disposal | Collected by government Burned Buried Composted Disposed into environment Other |
was_collect_gov was_burn was_bury was_compost was_dispose_env was_other |
Household | Water Source | Protected well - within premises Protected well - outside premises Unprotected well Tap - within unit Tap - outside unit but within premises Tap - outside premises Rural water projects Tube well Bowser River / Tank / Stream Rain water Bottled water Other |
wat_well_prot_in wat_well_prot_out wat_well_unprot wat_tap_unit_in wat_tap_prem_in wat_tap_prem_out wat_rural wat_tubewell wat_bowser wat_river_tank_stream wat_rain wat_bottled wat_other |
Population | Age | 0 - 4 5 - 9 10 - 14 15 - 19 20 - 24 25 - 29 30 - 34 35 - 39 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 75 - 79 80 - 84 85 - 89 90 - 94 95 & above |
age_y0_4 age_y5_9 age_y10_14 age_y15_19 age_y20_24 age_y25_29 age_y30_34 age_y35_39 age_y40_44 age_y45_49 age_y50_54 age_y55_59 age_y60_64 age_y65_69 age_y70_74 age_y75_79 age_y80_84 age_y85_89 age_y90_94 age_y95_above |
Population | Education | Primary Secondary O Level A Level Degree & Above No Schooling |
edu_primary edu_secondary edu_olevel edu_alevel edu_degree edu_none |
Population | Employment | Employed Unemployed Economically Inactive |
emp_employed emp_unemployed emp_inactive |
Population | Gender | Male Female |
gen_male gen_female |
Note: The 2011 national census datasets are only available as a summary of counts at the Grama Niladhari Division (GND) level. The original categorical variables surveyed at the household level have been converted to binary variables and aggregated for each GND. This obscures certain correlations between variables, therefore our results are suboptimal.
We employed principal component analysis (PCA) and extracted the first principal component to use as the socioeconomic index. We strongly recommend reading Vyas and Kumaranayake (2006) for a thorough justification of this method as well as an exploration of alternatives.
This whitepaper contains a thorough description of our process. In short, we observed the following procedure:
- Curate the dataset to remove variables that are either redundant or non-indicative of socioeconomic status.
- Normalize the dataset with respect to each category within each GND.
- Standardize each variable.
- Run PCA on the standardized dataset.
- Extract the weights given by the first principal component.
- Multiply the standardized dataset by these weights.
- Sum the above scores for each GND to get the socioeconomic index.
We separated the GNDs into seven quantiles and plotted their socioeconomic index as a choropleth map. These are our results using the household and population datasets.