Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an overview doc of Ex. 21 formats/layouts #3447

Closed
6 of 7 tasks
katie-lamb opened this issue Mar 6, 2024 · 1 comment
Closed
6 of 7 tasks

Create an overview doc of Ex. 21 formats/layouts #3447

katie-lamb opened this issue Mar 6, 2024 · 1 comment
Assignees
Labels
mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data

Comments

@katie-lamb
Copy link
Member

katie-lamb commented Mar 6, 2024

As a first step in the extraction process we need to scope the different formats that the Ex. 21 documents come in so we can have a better understanding of what layouts to try to extract from first. Create a document with the following:

Tasks

Preview Give feedback

Once we've conducted this overview, we'll have a better idea of how to create a representative sample of documents to begin extracting from, or which layout to start with.

@katie-lamb katie-lamb converted this from a draft issue Mar 6, 2024
@katie-lamb katie-lamb self-assigned this Mar 6, 2024
@katie-lamb katie-lamb added the mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data label Mar 6, 2024
@katie-lamb
Copy link
Member Author

Began creating a google doc to track this, so far main issue is that sometimes Ex. 21 tables are within an HTML table while other times they're part of a HTML body. Even when using pandas read_html to read in from an HTML table there needs to be some serious reformatting to get the schema right. This might require categorizing into the different formats/schemas.

@jdangerx jdangerx moved this from In progress to Done in Catalyst Megaproject Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data
Projects
Archived in project
Development

No branches or pull requests

1 participant