Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make census fips codes archiver #507

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Make census fips codes archiver #507

wants to merge 8 commits into from

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jan 9, 2025

Overview

working on catalyst-cooperative/pudl#3884

What problem does this address?

  • this adds an archiver which grabs that county level fips codes from the census

What did you change in this PR?

  • i added an archiver module. I used the template in the readme and edited it mostly using patterns from the NREL ATB archiver
  • I was surprised to have to add an xls media type. this was only required for the older files but i'm surprised we've never had to archive xls files. Did I miss something here?
  • I had to do a semi-janky add of the 1990-2000/.. txt file.
  • (i also added census metadata in pudl - i will link make that pr and link to it here)

Testing

How did you make sure this worked? How can a reviewer verify this?

I ran pudl_archiver --datasets censusfips --initialize --summary-file censusfips-summary.json

to make a draft/sandbox archives here:

To-do list

Tasks

Preview Give feedback

@cmgosnell cmgosnell requested a review from e-belfer January 9, 2025 14:31
@cmgosnell cmgosnell self-assigned this Jan 9, 2025
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually suggest archiving the 1990-2000 file, which notes that it was last updated for the 2001 population estimate in the metadata, as year=2001. We're not planning on extracting anything but the last file in PUDL anyways, so we won't have to deal with extraction but it's relatively straightforward to archive and I don't see an argument for not doing it?

src/pudl_archiver/archivers/censusfips.py Outdated Show resolved Hide resolved
src/pudl_archiver/archivers/censusfips.py Outdated Show resolved Hide resolved
src/pudl_archiver/archivers/censusfips.py Outdated Show resolved Hide resolved
src/pudl_archiver/archivers/censusfips.py Outdated Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from e-belfer January 9, 2025 19:56
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-blocking comment about the file names, which maybe we want to make consistent with each other and with our other datasets?

src/pudl_archiver/archivers/censuspep.py Show resolved Hide resolved
@cmgosnell
Copy link
Member Author

cmgosnell commented Jan 14, 2025

some questions/notes:

  • if we always want {datasource}_{partition} we should add that to the readme (i can do but want to coordinate doing bc we are about to redo instructions).. or even just make this a must in download_file
  • do all or most of our archives expect that we will have one zipped file per partition? I believe i can use download_and_zip_file for that. but the docs for that and download_file are not consistent so it was a lil confusing to use. i can update the args to make em slightly more clear one i sorta understand em.
  • post making one --initialize archive, do i have to store the doi somewhere? if i remove it from the command on the readme i get this error:
File "/Users/christinagosnell/code/pudl-archiver/src/pudl_archiver/depositors/zenodo/depositor.py", line 467, in get_deposition
    concept_rec_id = concept_doi.split(".")[2]
                     ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'

Answer: yes in package_data there is a yaml file. we should also add it to the readme docs

  • does the sandbox work at all?
  File "/Users/christinagosnell/code/pudl-archiver/src/pudl_archiver/depositors/zenodo/depositor.py", line 526, in run_request
    raise ZenodoClientError(
pudl_archiver.depositors.zenodo.depositor.ZenodoClientError: ZenodoClientError(status=400, message=CSRF token missing or incorrect., errors=None)
[1]    96686 exit 1     pudl_archiver --datasets censuspep --initialize --summary-file  --sandbox

Answer: yes but i needed the right key! bc i had the old one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants