`process_data()` throws an error with 2024 data #23

mthomas-ketchbrook · 2024-05-16T03:15:06Z

An error is thrown when using the process_data() function against a directory containing the March 2024 downloaded data.

Reproducible Example

library(fcall)
# Download March 2024 data
download_data(year = 2024, month = 3, dest = "data-raw/2024-03")
# Process data
processed_data <- process_data("data-raw/2024-03")

returns the error:

Error in `map2()`:
ℹ️ In index: 28.
ℹ️ With name: RCR7.
Caused by error in `scan()`:
! line 61 did not have 535 elements

Error Details

The problem occurs due to missing rows in the RCR7_Q202403_G20240508.TXT file.
As described in Scenario 3, the RCR7 file expects, for each institution in the data file:

a row that contains comma-separated values for variables that belong to the first set of single-occurrence variables
a row for each class of the code variable with comma-separated values of multiple-occurrence variables
a row that contains comma-separated values of the remaining single-occurrence variables

In particular, there are some institutions that have missing entries for code class 2000 (i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).

Our current approach assumes that the RCR7 data published by FCA will have a row for each RegCapCode (for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of the D_RCR7.TXT file itself.

This missing 2000 code for some variables (for certain institutions) is causing process_data() to fail.

Possible Workarounds

There are several options for troubleshooting this error:

Avoid processing the RCR7 file by removing D_RCR7.TXT and RCR7_Q202403_G20240508.TXT from the directory where the data was downloaded into (i.e., the dir argument of process_data()).
Leverage process_metadata_file() and process_data_file() to process the non-RCR7 files you are interested in.
For example, the code below shows how to process only the RCB data:

RCB_metadata <- fcall::process_metadata_file(file = "data-raw/2024-03/D_RCB.TXT")
RCB_data <- fcall::process_data_file(
  file = "data-raw/2024-03/RCB_Q202403_G20240508.TXT",
  metadata = RCB_metadata,
  dict = RCB__INV_CODE
)

Remember that available dicts are stored as internal {fcall} datasets.

Manually add the missing lines to RCR7_Q202403_G20240508.TXT (this assumes all values for this code are zero).
You can add 2000,,,,,,,,,,,,,,,,,, below each instance of a row that starts with 1900 that is not followed by a row that starts with 2000.
Replace the RCR7_Q202403_G20240508.TXT file in the directory where the data was downloaded into (i.e., the dir argument of process_data()) with the attached file below that applies the changes described in # 3 above.

RCR7_Q202403_G20240508.TXT

The text was updated successfully, but these errors were encountered:

mthomas-ketchbrook · 2024-08-09T04:09:06Z

This same error is also present in the June 2024 file (which was posted the week of August 5, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().

Ketchbrook has been in communication with FCA, with the goal being that FCA will replace the current .zip files posted on the website with fixed versions. However, it appears that they are still working on this resolution, and were not able to fix it before the June 2024 release.

RCR7_Q202406_G20240807.TXT

mthomas-ketchbrook · 2024-11-17T05:08:15Z

This same error is also present in the September 2024 file (which was posted the week of November 11, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().

RCR7_Q202409_G20241107.TXT

mthomas-ketchbrook added the wontfix This will not be worked on label May 16, 2024

mthomas-ketchbrook pinned this issue May 21, 2024

mthomas-ketchbrook changed the title ~~process_data() throws an error with March 2024 data~~ process_data() throws an error with 2024 data Aug 9, 2024

mthomas-ketchbrook linked a pull request Dec 13, 2024 that will close this issue

Improve error message when 2024 files are passed to process_data() #32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`process_data()` throws an error with 2024 data #23

`process_data()` throws an error with 2024 data #23

mthomas-ketchbrook commented May 16, 2024 •

edited

Loading

mthomas-ketchbrook commented Aug 9, 2024

mthomas-ketchbrook commented Nov 17, 2024

process_data() throws an error with 2024 data #23

process_data() throws an error with 2024 data #23

Comments

mthomas-ketchbrook commented May 16, 2024 • edited Loading

Reproducible Example

Error Details

Possible Workarounds

mthomas-ketchbrook commented Aug 9, 2024

mthomas-ketchbrook commented Nov 17, 2024

`process_data()` throws an error with 2024 data #23

`process_data()` throws an error with 2024 data #23

mthomas-ketchbrook commented May 16, 2024 •

edited

Loading