This repository serves as the authoritative source for all geography files used by the Social Impact Data Commons.
Data files, data information files, and data creation code area
organized in identical named sets of folders in the /data, /docs, and
/src folders, respectively. For example, the dataset
“va059_geo_ffxct_gis_2022_zip_codes” the following folders are
maintained:
/data/va059_geo_ffxct_gis_2022_zip_codes
/docs/01data/va059_geo_ffxct_gis_2022_zip_codes
/src/01data/va059_geo_ffxct_gis_2022_zip_codes
The individual dataset folders in the /data folder are further
sub-divided into /original, /working, and /distribution.
The files
in a dataset’s distribution folder are the authoritative files for
distribution.
A manifest with a list of the latest distribution files, along with md5 checksums, is available at /data/distribution_file_manifest.csv
Current geography files:
dc_geo_census_cb_2010_census_block_groups.geojson
dc_geo_census_cb_2010_census_tracts.geojson
dc_geo_census_cb_2010_counties.geojson
dc_geo_census_cb_2020_census_block_groups.geojson
dc_geo_census_cb_2020_census_tracts.geojson
dc_geo_census_cb_2020_counties.geojson
md_geo_census_cb_2010_census_block_groups.geojson
md_geo_census_cb_2010_census_tracts.geojson
md_geo_census_cb_2010_counties.geojson
md_geo_census_cb_2020_census_block_groups.geojson
md_geo_census_cb_2020_census_tracts.geojson
md_geo_census_cb_2020_counties.geojson
ncr_geo_census_cb_2010_census_block_groups.geojson
ncr_geo_census_cb_2010_census_tracts.geojson
ncr_geo_census_cb_2010_counties.geojson
ncr_geo_census_cb_2020_census_block_groups.geojson
ncr_geo_census_cb_2020_census_tracts.geojson
ncr_geo_census_cb_2020_counties.geojson
us_geo_census_cb_2010_census_tracts.geojson
us_geo_census_cb_2010_counties.geojson
us_geo_census_cb_2020_census_tracts.geojson
us_geo_census_cb_2020_counties.geojson
va_geo_census_cb_2010_census_block_groups.geojson
va_geo_census_cb_2010_census_tracts.geojson
va_geo_census_cb_2010_counties.geojson
va_geo_census_cb_2020_census_block_groups.geojson
va_geo_census_cb_2020_census_tracts.geojson
va_geo_census_cb_2020_counties.geojson
va_geo_vhd_2020_health_districts.geojson
va013_geo_arl_2021_civic_associations.geojson
va059_geo_ffxct_gis_2022_human_services_regions.geojson
va059_geo_ffxct_gis_2022_planning_districts.geojson
va059_geo_ffxct_gis_2022_supervisor_districts.geojson
va059_geo_ffxct_gis_2022_zip_codes.geojson
- Create new yaml settings file and place in
/src/01_data/00_dataset_yaml_files. For example:
--- dataset_info: dataset_type: "geo" # geo or data data_agency_company_abbrv: "ffxct" data_center_program: "gis" dataset_general_subject: "Supervisor Districts" dataset_sub_subject: "" dataset_region_type: "supervisor district" dataset_geo_coverage: "va059" dataset_geo_level: "" # if dataset_type = "data" dataset_start_year: "2022" dataset_end_year: "" dataset_source_files: - type: "data" format: "geojson" url: "https://services1.arcgis.com/ioennV6PpG5Xodq0/arcgis/rest/services/OpenData_S1/FeatureServer/17/query?outFields=*&where=1%3D1&f=geojson"
- Open /src/01_data/01_set_up_new_datasets.R and add new
new_dataset_setup command, and run. This creates a filename all of
the folders that will go along with this dataset. For example:
# Fairfax County Supervisor Districts Geo new_dataset_setup(dataset_info_yml_file_path = "src/01_data/00_dataset_yaml_files/fairfax_county_supervisor_districts_geo.yml", code_files = "R", overwrite = TRUE)
- Now go to the folder created for this dataset in /src/01_data/ and open the first ingest file (01_ingest_file_1.R). There should now be one file for ingest for each source file you listed in the yaml file.
Running the already created code should download the source file and place it in the /data//original folder.
- Now open the prepared dataset preparation file, 02_prepare.R. This file has the basic steps and some of the needed code already prepopolated.
Once finished, you will have a final dataset in /data//distribution.