Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process_data() throws an error with 2024 data #23

Open
mthomas-ketchbrook opened this issue May 16, 2024 · 2 comments · May be fixed by #32
Open

process_data() throws an error with 2024 data #23

mthomas-ketchbrook opened this issue May 16, 2024 · 2 comments · May be fixed by #32
Labels
wontfix This will not be worked on

Comments

@mthomas-ketchbrook
Copy link
Collaborator

mthomas-ketchbrook commented May 16, 2024

An error is thrown when using the process_data() function against a directory containing the March 2024 downloaded data.

Reproducible Example

library(fcall)
# Download March 2024 data
download_data(year = 2024, month = 3, dest = "data-raw/2024-03")
# Process data
processed_data <- process_data("data-raw/2024-03")

returns the error:

Error in `map2()`:
ℹ️ In index: 28.
ℹ️ With name: RCR7.
Caused by error in `scan()`:
! line 61 did not have 535 elements

Error Details

The problem occurs due to missing rows in the RCR7_Q202403_G20240508.TXT file.
As described in Scenario 3, the RCR7 file expects, for each institution in the data file:

  • a row that contains comma-separated values for variables that belong to the first set of single-occurrence variables
  • a row for each class of the code variable with comma-separated values of multiple-occurrence variables
  • a row that contains comma-separated values of the remaining single-occurrence variables

In particular, there are some institutions that have missing entries for code class 2000 (i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).

Our current approach assumes that the RCR7 data published by FCA will have a row for each RegCapCode (for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of the D_RCR7.TXT file itself.

This missing 2000 code for some variables (for certain institutions) is causing process_data() to fail.

Possible Workarounds

There are several options for troubleshooting this error:

  1. Avoid processing the RCR7 file by removing D_RCR7.TXT and RCR7_Q202403_G20240508.TXT from the directory where the data was downloaded into (i.e., the dir argument of process_data()).
  2. Leverage process_metadata_file() and process_data_file() to process the non-RCR7 files you are interested in.
    For example, the code below shows how to process only the RCB data:
RCB_metadata <- fcall::process_metadata_file(file = "data-raw/2024-03/D_RCB.TXT")
RCB_data <- fcall::process_data_file(
  file = "data-raw/2024-03/RCB_Q202403_G20240508.TXT",
  metadata = RCB_metadata,
  dict = RCB__INV_CODE
)

Remember that available dicts are stored as internal {fcall} datasets.

  1. Manually add the missing lines to RCR7_Q202403_G20240508.TXT (this assumes all values for this code are zero).
    You can add 2000,,,,,,,,,,,,,,,,,, below each instance of a row that starts with 1900 that is not followed by a row that starts with 2000.
  2. Replace the RCR7_Q202403_G20240508.TXT file in the directory where the data was downloaded into (i.e., the dir argument of process_data()) with the attached file below that applies the changes described in # 3 above.

RCR7_Q202403_G20240508.TXT

@mthomas-ketchbrook mthomas-ketchbrook added the wontfix This will not be worked on label May 16, 2024
@mthomas-ketchbrook mthomas-ketchbrook pinned this issue May 21, 2024
@mthomas-ketchbrook mthomas-ketchbrook changed the title process_data() throws an error with March 2024 data process_data() throws an error with 2024 data Aug 9, 2024
@mthomas-ketchbrook
Copy link
Collaborator Author

This same error is also present in the June 2024 file (which was posted the week of August 5, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().

Ketchbrook has been in communication with FCA, with the goal being that FCA will replace the current .zip files posted on the website with fixed versions. However, it appears that they are still working on this resolution, and were not able to fix it before the June 2024 release.

RCR7_Q202406_G20240807.TXT

@mthomas-ketchbrook
Copy link
Collaborator Author

This same error is also present in the September 2024 file (which was posted the week of November 11, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().

RCR7_Q202409_G20241107.TXT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant