Remove duplicates in the `mine_id_msha` column #3213

bendnorman · 2024-01-04T16:10:13Z

Based on the comment in src/pudl/transform/eia923.py the mine_id_msha column should have unique ids:

Lines 1043 to 1059 in 618c20c

    
           # If we actually *have* an MSHA ID for a mine, then we have a totally 
        
           # unique identifier for that mine, and we can safely drop duplicates and 
        
           # keep just one copy of that mine, no matter how different all the other 
        
           # fields associated with the mine info are... Here we split out all the 
        
           # coalmine records that have an MSHA ID, remove them from the CMI 
        
           # data frame, drop duplicates, and then bring the unique mine records 
        
           # back into the overall CMI dataframe... 
        
           cmi_with_msha = cmi_df[cmi_df["mine_id_msha"] > 0] 
        
           cmi_with_msha = cmi_with_msha.drop_duplicates( 
        
               subset=[ 
        
                   "mine_id_msha", 
        
               ] 
        
           ) 
        
           cmi_df.drop(cmi_df[cmi_df["mine_id_msha"] > 0].index) 
        
           cmi_df = pd.concat([cmi_df, cmi_with_msha]) 
        
           cmi_df = cmi_df.drop_duplicates(subset=coalmine_cols)

but they aren't because the cmi_df isn't overwritten.

It's been a while since I've played with this data so I'm not sure what the best option is.

The text was updated successfully, but these errors were encountered:

bendnorman added this to Catalyst Megaproject Jan 4, 2024

bendnorman converted this from a draft issue Jan 4, 2024

bendnorman mentioned this issue Jan 4, 2024

Remove duplicates in the mine_id_msha column #1991

Closed

bendnorman moved this from New to Icebox in Catalyst Megaproject Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicates in the `mine_id_msha` column #3213

Remove duplicates in the `mine_id_msha` column #3213

bendnorman commented Jan 4, 2024 •

edited

Loading

Remove duplicates in the mine_id_msha column #3213

Remove duplicates in the mine_id_msha column #3213

Comments

bendnorman commented Jan 4, 2024 • edited Loading

Remove duplicates in the `mine_id_msha` column #3213

Remove duplicates in the `mine_id_msha` column #3213

bendnorman commented Jan 4, 2024 •

edited

Loading