Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is the in_rate_base records from the liabilities table all null from 2005-2020? #3233

Closed
cmgosnell opened this issue Jan 11, 2024 · 8 comments · Fixed by #3234
Closed
Assignees
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL. dbf Data coming from FERC's old Visual FoxPro DBF database file format. ferc1 Anything having to do with FERC Form 1

Comments

@cmgosnell
Copy link
Member

cmgosnell commented Jan 11, 2024

For some reason all of the records labeled in_rate_base from 2005 and 2020 are null:

df=defs.load_asset_value(AssetKey("exploded_income_statement_ferc1"))
tbl = df[
    (df.ending_balance.notnull())
    & (df.tags_in_rate_base == "yes")
]
plt.scatter(
    tbl.report_year,
    tbl.ending_balance,
    alpha=.2
)

Image

removing the tags_in_rate_base=="yes" mask, there is a ton of non-null data in the table..

i will first check if this is a low label issue...

@cmgosnell cmgosnell converted this from a draft issue Jan 11, 2024
@cmgosnell cmgosnell self-assigned this Jan 11, 2024
@cmgosnell
Copy link
Member Author

cmgosnell commented Jan 11, 2024

sigh.... it looks like this problem is coming straight from the dbf sqlite db

@zaneselvans any ideas on checking the source of this?

@zaneselvans
Copy link
Member

Ooooh, your plot is so pretty.

@zaneselvans
Copy link
Member

You could use dbfread directly to try and pull this table out of say the 2010 archive of the FERC Form 1 inputs and see what you find. The f1_bal_sheet_cr table corresponds to F1_11.DBF and looking at the 2010 data, it seems to contain data. The file is about 2.7MB.

@cmgosnell
Copy link
Member Author

hahah 🎨

import mplcyberpunk
plt.style.use("cyberpunk")

and thanks i'll take a peek

@zaneselvans
Copy link
Member

zaneselvans commented Jan 11, 2024

I think I see what's happening:

import dbfread
f1_bal_sheet_cr_dbf = dbfread.DBF(
    "/tmp/UPLOADERS/FORM1/working/F1_11.DBF",
)
f1_bal_sheet_cr_dbf.load()
f1_bal_sheet_cr_df = pd.DataFrame(f1_bal_sheet_cr_dbf.records)
f1_bal_sheet_cr_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23736 entries, 0 to 23735
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   RESPONDENT  23736 non-null  int64 
 1   REPORT_YEA  23736 non-null  int64 
 2   SPPLMNT_NU  23736 non-null  int64 
 3   ROW_NUMBER  23736 non-null  int64 
 4   ROW_SEQ     23736 non-null  int64 
 5   ROW_PRVLG   23736 non-null  object
 6   BEGIN_YR_B  0 non-null      object
 7   END_YR_BAL  0 non-null      object
 8   BEGIN_YR_2  23736 non-null  int64 
 9   END_YR_BA2  23736 non-null  int64 
 10  REPORT_PRD  23736 non-null  int64 
 11  END_QTR_BA  23736 non-null  int64 
 12  PRI_YR_Q4_  23736 non-null  int64 
 13  END_QTR_B2  23736 non-null  int64 
 14  PRI_YR_Q42  23736 non-null  int64 
 15  _NullFlags  23736 non-null  object
dtypes: int64(12), object(4)
memory usage: 2.9+ MB

The starting and ending year balances are indeed NULL. But there's quarterly data, and there's "end of Q4 last year" data.

f1_bal_sheet_cr_df.sample(20)
RESPONDENT REPORT_YEA SPPLMNT_NU ROW_NUMBER ROW_SEQ ROW_PRVLG BEGIN_YR_B END_YR_BAL BEGIN_YR_2 END_YR_BA2 REPORT_PRD END_QTR_BA PRI_YR_Q4_ END_QTR_B2 PRI_YR_Q42 _NullFlags
5407 185 2010 0 29 29 0 0 6 2379398 2265830 0 0
20086 266 2010 0 28 28 0 0 12 2611319 3610632 0 0
16988 11 2010 0 38 38 0 0 9 17889530 14992020 0 0
11405 185 2010 0 35 35 0 0 9 2322518 2265830 0 0
1718 149 2010 0 16 16 0 0 3 4339248092 4302112149 0 0
22139 446 2010 0 48 48 0 0 12 6282862 0 0 0
15758 182 2010 0 49 49 0 0 9 54886 51738 0 0
158 40 2010 0 63 63 0 0 3 3068685 3130431 0 0
11050 111 2010 0 57 57 0 0 6 3058544 3398384 0 0
12784 134 2010 0 11 11 0 0 9 2666784903 2225701346 0 0

@cmgosnell
Copy link
Member Author

yea there is nothing in the ending balance in several years I just checked. There a far amount of END_QTR_BA or end_qtr_bal.

sigh it looks like they reported all of the money as end of quarter balances instead of end of year balances for these years.

@cmgosnell
Copy link
Member Author

I think that means we could integrate the end of quarter data as annual data

@zaneselvans
Copy link
Member

Yeah, I think if you select ONLY the report_prd==12 records.

@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 data-cleaning Tasks related to cleaning & regularizing data during ETL. dbf Data coming from FERC's old Visual FoxPro DBF database file format. labels Jan 12, 2024
@zaneselvans zaneselvans moved this from Backlog to In review in Catalyst Megaproject Jan 12, 2024
@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL. dbf Data coming from FERC's old Visual FoxPro DBF database file format. ferc1 Anything having to do with FERC Form 1
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants