Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource and field descriptions; purge unused fields #3224

Closed
6 tasks done
Tracked by #3102
zaneselvans opened this issue Jan 9, 2024 · 2 comments · Fixed by #3283
Closed
6 tasks done
Tracked by #3102

Add resource and field descriptions; purge unused fields #3224

zaneselvans opened this issue Jan 9, 2024 · 2 comments · Fixed by #3283
Assignees
Labels
docs Documentation for users and contributors. eia861 Anything having to do with EIA Form 861 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages.
Milestone

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Jan 9, 2024

While working on Parquet outputs in #3102 we discovered a number of unused field definitions, fields with no descriptions, and resources with no descriptions.

Tasks

Preview Give feedback
UNUSED_FIELDS = [
    "active",
    "country",
    "credits_or_adjustments",
    "delivery_customers",
    "depreciation_amortization_value",
    "electric_plant",
    "energy_source",
    "environmental_equipment_name",
    "expense",
    "fuel_transportation_mode",
    "future_plant",
    "income",
    "is_total",
    "leased_plant",
    "line_id",
    "month",
    "notes",
    "operator_name",
    "operator_state",
    "operator_utility_id_eia",
    "other",
    "other_total",
    "owner_name",
    "peak_demand_summer_mw",
    "peak_demand_winter_mw",
    "period_nox",
    "period_particulate",
    "period_so2",
    "prime_mover",
    "retail_sales",
    "sales_for_resale",
    "status",
    "storage_capacity_mw",
    "storage_customers",
    "total",
    "total_meters",
    "transmission",
    "unbundled_revenues",
    "utility_attn",
    "utility_pobox",
    "virtual_capacity_mw",
    "virtual_customers",
]

RESOURCES_NO_DESC = [
    "core_eia861__yearly_demand_side_management_misc",
    "core_eia861__yearly_demand_side_management_sales",
    "core_eia861__yearly_distributed_generation_fuel",
    "core_eia861__yearly_distributed_generation_misc",
    "core_eia861__yearly_distributed_generation_tech",
    "core_eia861__yearly_net_metering_customer_fuel_class",
    "core_eia861__yearly_net_metering_misc",
    "core_eia861__yearly_non_net_metering_customer_fuel_class",
    "core_eia861__yearly_non_net_metering_misc",
    "core_eia861__yearly_operational_data_misc",
    "core_eia861__yearly_operational_data_revenue",
    "core_eia861__yearly_reliability",
    "core_eia861__yearly_utility_data_misc",
    "core_eia861__yearly_utility_data_nerc",
    "core_eia861__yearly_utility_data_rto",
]

FIELDS_NO_DESC = [
    "actual_peak_demand_savings_mw",
    "address_2",
    "advanced_metering_infrastructure",
    "alternative_fuel_vehicle_2_activity",
    "alternative_fuel_vehicle_activity",
    "annual_indirect_program_cost",
    "annual_total_cost",
    "attention_line",
    "automated_meter_reading",
    "avg_num_employees",
    "backup_capacity_mw",
    "bundled_activity",
    "business_model",
    "buying_distribution_activity",
    "buying_transmission_activity",
    "caidi_w_major_event_days_minus_loss_of_service_minutes",
    "caidi_w_major_event_days_minutes",
    "caidi_wo_major_event_days_minutes",
    "chlorine_content_ppm",
    "circuits_with_voltage_optimization",
    "consumed_by_facility_mwh",
    "consumed_by_respondent_without_charge_mwh",
    "credits_or_adjustments",
    "critical_peak_pricing",
    "critical_peak_rebate",
    "customer_incentives_cost",
    "customer_incentives_incremental_cost",
    "customer_incentives_incremental_life_cycle_cost",
    "customer_other_costs_incremental_life_cycle_cost",
    "daily_digital_access_customers",
    "delivery_customers",
    "demand_annual_mwh",
    "demand_mwh",
    "direct_load_control_customers",
    "distributed_generation_owned_capacity_mw",
    "distribution_activity",
    "distribution_circuits",
    "energy_displaced_mwh",
    "energy_efficiency_annual_actual_peak_reduction_mw",
    "energy_efficiency_annual_cost",
    "energy_efficiency_annual_effects_mwh",
    "energy_efficiency_annual_incentive_payment",
    "energy_efficiency_incremental_actual_peak_reduction_mw",
    "energy_efficiency_incremental_effects_mwh",
    "energy_savings_estimates_independently_verified",
    "energy_savings_independently_verified",
    "energy_savings_mwh",
    "energy_served_ami_mwh",
    "energy_source",
    "estimated_or_actual_capacity_data",
    "estimated_or_actual_fuel_data",
    "estimated_or_actual_tech_data",
    "exchange_energy_delivered_mwh",
    "exchange_energy_received_mwh",
    "ferc_account_description",
    "fuel_class",
    "fuel_pct",
    "fuel_transportation_mode",
    "fuel_type",
    "furnished_without_charge_mwh",
    "generation_activity",
    "generators_num_less_1_mw",
    "generators_number",
    "green_pricing_revenue",
    "highest_distribution_voltage_kv",
    "home_area_network",
    "inactive_accounts_included",
    "incremental_energy_savings_mwh",
    "incremental_life_cycle_energy_savings_mwh",
    "incremental_life_cycle_peak_reduction_mwh",
    "incremental_peak_reduction_mw",
    "load_management_annual_actual_peak_reduction_mw",
    "load_management_annual_cost",
    "load_management_annual_effects_mwh",
    "load_management_annual_incentive_payment",
    "load_management_annual_potential_peak_reduction_mw",
    "load_management_incremental_actual_peak_reduction_mw",
    "load_management_incremental_effects_mwh",
    "load_management_incremental_potential_peak_reduction_mw",
    "major_program_changes",
    "merge_address",
    "merge_city",
    "merge_company",
    "merge_date",
    "moisture_content_pct",
    "momentary_interruption_definition",
    "nerc_regions_of_operation",
    "net_power_exchanged_mwh",
    "net_wheeled_power_mwh",
    "new_parent",
    "non_amr_ami",
    "operates_generating_plant",
    "other",
    "other_costs",
    "other_costs_incremental_cost",
    "outages_recorded_automatically",
    "peak_demand_summer_mw",
    "peak_demand_winter_mw",
    "plant_type",
    "potential_peak_demand_savings_mw",
    "price_responsive_programs",
    "price_responsiveness_customers",
    "pv_current_flow_type",
    "real_time_pricing",
    "rec_revenue",
    "rec_sales_mwh",
    "reported_as_another_company",
    "respondent_type",
    "retail_marketing_activity",
    "retail_sales",
    "retail_sales_mwh",
    "revenue",
    "revenue_class",
    "revenue_per_kwh",
    "rtos_of_operation",
    "saidi_w_major_event_days_minus_loss_of_service_minutes",
    "saidi_w_major_event_days_minutes",
    "saidi_wo_major_event_days_minutes",
    "saifi_w_major_event_days_customers",
    "saifi_w_major_event_days_minus_loss_of_service_customers",
    "saifi_wo_major_event_days_customers",
    "sales_for_resale",
    "sales_for_resale_mwh",
    "sales_to_ultimate_consumers_mwh",
    "service_type",
    "short_form",
    "sold_to_utility_mwh",
    "standard",
    "status",
    "storage_capacity_mw",
    "storage_customers",
    "summer_peak_demand_mw",
    "tech_class",
    "time_of_use_pricing",
    "time_responsive_programs",
    "time_responsiveness_customers",
    "total_capacity_less_1_mw",
    "total_disposition_mwh",
    "total_energy_losses_mwh",
    "total_meters",
    "total_sources_mwh",
    "transmission",
    "transmission_activity",
    "transmission_by_other_losses_mwh",
    "unbundled_revenues",
    "utc_datetime",
    "utility_attn",
    "utility_owned_capacity_mw",
    "utility_pobox",
    "variable_peak_pricing",
    "virtual_capacity_mw",
    "virtual_customers",
    "water_heater",
    "weighted_average_life_years",
    "wheeled_power_delivered_mwh",
    "wheeled_power_received_mwh",
    "wholesale_marketing_activity",
    "wholesale_power_purchases_mwh",
    "winter_peak_demand_mw",
]
@zaneselvans zaneselvans added eia861 Anything having to do with EIA Form 861 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. labels Jan 9, 2024
@zaneselvans zaneselvans moved this from New to In progress in Catalyst Megaproject Jan 9, 2024
@zaneselvans zaneselvans added the docs Documentation for users and contributors. label Jan 9, 2024
@zaneselvans zaneselvans added this to the v2024.01 milestone Jan 12, 2024
@aesharpe
Copy link
Member

@zaneselvans one of the tasks above is "Enable the integration test that verifies all defined fields are being used." Does this mean there is already a function that does this that must be enabled? Or do I need to write one?

@zaneselvans
Copy link
Member Author

@aesharpe I created this test when I was debugging the PyArrow schema stuff, but it's currently marked xfail, since there were some fields that weren't being used and needed to be purged:

pytest test/unit/metadata_test.py::test_defined_fields_are_used

Below it there are 2 additional xfail tests that ensure all fields and resources have descriptions (and provide useful feedback about which ones don't, if there are any) but those can be removed once we update the Field and Resource classes to require a description.

aesharpe added a commit that referenced this issue Jan 24, 2024
* Add a description to the RESOURCE_METADATA for tables missing a description.
* Make the description variable necessary for Resource() instances.
* Update unit tests to have description fields for Resource() instances.
* Fix docstring examples so the builds don't fail.
* Tack on the alembic update that should have gone with the previous commit.
aesharpe added a commit that referenced this issue Jan 24, 2024
* Remove the xfail for the test_defined_fields_are_used() function.
* Remove three fields that I missed before that aren't being used.
* Update the alembic file for those three removed fields.
@aesharpe aesharpe linked a pull request Jan 24, 2024 that will close this issue
11 tasks
@zaneselvans zaneselvans moved this from In progress to In review in Catalyst Megaproject Jan 24, 2024
zaneselvans added a commit that referenced this issue Jan 31, 2024
* Fix field descriptions and table-less fields
* Remove fields that are not in any PUDL tables.
* Add descriptions to fields that do not have a description.
* Made field descriptions mandatory.
* Update unit tests to have description fields when making test Resources
* Fix docstring examples so the builds don't fail
* Add resource descriptions and make them mandatory #3224
* Add a description to the RESOURCE_METADATA for tables missing a description.
* Make the description variable necessary for Resource() instances.
* Update unit tests to have description fields for Resource() instances.
* Fix docstring examples so the builds don't fail.
* Tack on the alembic update that should have gone with the previous commit.
* Reinstate unit test that all fields are being used #3224
* Remove the xfail for the test_defined_fields_are_used() function.
* Remove three fields that I missed before that aren't being used.
* Update the alembic file for those three removed fields.
* Update release notes
* Remove XFAIL marks from metadata tests that should pass.
* Remove two unit tests that test whether resources and fields have description fields because now they are required #3283
* Add some small changes to Resource descriptions #3283
* Fix incorrect incremental_life_cycle_peak_reduction_mw units (MWh -> MW)
* Update EIA-861 resource & column maps to use incremental_life_cycle_peak_reduction_mw
* Wipe conflicting alembic migrations and start fresh.
* Fix types and make small tweaks to field descriptions #3283
* In the core_eia861__yearly_net_metering_customer_fuel_class table, combine the energy_displaced_mwh column with the sold_to_utility_mwh column. The former only shows up in years 2007-2009 and upon further inspection seems analogous with the latter. Removed the former from the schema and updated the column map to point the old energy_displaced_mwh columns at the sold_to_utility_mwh column.

---------

Co-authored-by: Zane Selvans <[email protected]>
@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation for users and contributors. eia861 Anything having to do with EIA Form 861 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants