You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An error is thrown when using the process_data() function against a directory containing the March 2024 downloaded data.
Reproducible Example
library(fcall)
# Download March 2024 data
download_data(year = 2024, month = 3, dest = "data-raw/2024-03")
# Process data
processed_data <- process_data("data-raw/2024-03")
returns the error:
Error in `map2()`:
ℹ️ In index: 28.
ℹ️ With name: RCR7.
Caused by error in `scan()`:
! line 61 did not have 535 elements
Error Details
The problem occurs due to missing rows in the RCR7_Q202403_G20240508.TXT file.
As described in Scenario 3, the RCR7 file expects, for each institution in the data file:
a row that contains comma-separated values for variables that belong to the first set of single-occurrence variables
a row for each class of the code variable with comma-separated values of multiple-occurrence variables
a row that contains comma-separated values of the remaining single-occurrence variables
In particular, there are some institutions that have missing entries for code class 2000 (i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).
Our current approach assumes that the RCR7 data published by FCA will have a row for each RegCapCode (for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of the D_RCR7.TXT file itself.
This missing 2000 code for some variables (for certain institutions) is causing process_data() to fail.
Possible Workarounds
There are several options for troubleshooting this error:
Avoid processing the RCR7 file by removing D_RCR7.TXT and RCR7_Q202403_G20240508.TXT from the directory where the data was downloaded into (i.e., the dir argument of process_data()).
Leverage process_metadata_file() and process_data_file() to process the non-RCR7 files you are interested in.
For example, the code below shows how to process only the RCB data:
Remember that available dicts are stored as internal {fcall} datasets.
Manually add the missing lines to RCR7_Q202403_G20240508.TXT (this assumes all values for this code are zero).
You can add 2000,,,,,,,,,,,,,,,,,, below each instance of a row that starts with 1900 that is not followed by a row that starts with 2000.
Replace the RCR7_Q202403_G20240508.TXT file in the directory where the data was downloaded into (i.e., the dir argument of process_data()) with the attached file below that applies the changes described in # 3 above.
This same error is also present in the June 2024 file (which was posted the week of August 5, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().
Ketchbrook has been in communication with FCA, with the goal being that FCA will replace the current .zip files posted on the website with fixed versions. However, it appears that they are still working on this resolution, and were not able to fix it before the June 2024 release.
This same error is also present in the September 2024 file (which was posted the week of November 11, 2024). An updated RCR7 file is attached that can be used to replace the RCR7 file returned by fcall::download_data().
An error is thrown when using the
process_data()
function against a directory containing the March 2024 downloaded data.Reproducible Example
returns the error:
Error Details
The problem occurs due to missing rows in the
RCR7_Q202403_G20240508.TXT
file.As described in Scenario 3, the
RCR7
file expects, for each institution in the data file:code
variable with comma-separated values of multiple-occurrence variablesIn particular, there are some institutions that have missing entries for
code
class2000
(i.e., some variables do not have a row that corresponds to the "Risk Weight Factor" for that variable).Our current approach assumes that the
RCR7
data published by FCA will have a row for eachRegCapCode
(for each multiple-occurrence variable) for each institution. In fact, the text "THERE IS ONE OCCURENCE FOR EACH RegCapCode VALUE" is published on the bottom of theD_RCR7.TXT
file itself.This missing
2000
code for some variables (for certain institutions) is causingprocess_data()
to fail.Possible Workarounds
There are several options for troubleshooting this error:
RCR7
file by removingD_RCR7.TXT
andRCR7_Q202403_G20240508.TXT
from the directory where the data was downloaded into (i.e., thedir
argument ofprocess_data()
).process_metadata_file()
andprocess_data_file()
to process the non-RCR7
files you are interested in.For example, the code below shows how to process only the
RCB
data:Remember that available
dict
s are stored as internal{fcall}
datasets.RCR7_Q202403_G20240508.TXT
(this assumes all values for this code are zero).You can add
2000,,,,,,,,,,,,,,,,,,
below each instance of a row that starts with1900
that is not followed by a row that starts with2000
.RCR7_Q202403_G20240508.TXT
file in the directory where the data was downloaded into (i.e., thedir
argument ofprocess_data()
) with the attached file below that applies the changes described in # 3 above.RCR7_Q202403_G20240508.TXT
The text was updated successfully, but these errors were encountered: