You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WHY
Today, the only output stream format available is .njson (i.e. a file with n lines, each line being a dictionnary).
This format has two downsides:
It does not allow us to easily conduct preliminary analysis on the output data: .njson files cannot be directly forwarded to non-tech users, and cannot be put into a pandas DataFrame without undergoing preliminary transformations.
Some APIs natively return data in a .csv format: in these cases, we have to convert each line to a dictionnary, which can occasion parsing errors.
HOW
Create a .csv streamer.
The text was updated successfully, but these errors were encountered:
I've started working on this issue and I've noticed that we may encounter a problem with the current software architecture.
Currently, the format of the destination file is enforced. We will have a .njson file by default. Even though there is a Pickle option, it is never used in the code. If we want to introduce a new format like CSV, we must let users decide which format they prefer. It would be intuitive to have an option in the writer command, something like write_gcs --gcs-file-format csv.
BUT, to do so, we need to change the stream we use (CSVStream vs JSONStream) and this choice must be implemented in the read() function in the reader. So, that would force us to add the file format as an option of the reader, something like read_dv360 --dv360-file-format csv, which is not as intuitive as if it was in the writer options because we now mix up the reader and writer options.
WHY
Today, the only output stream format available is
.njson
(i.e. a file with n lines, each line being a dictionnary).This format has two downsides:
.njson
files cannot be directly forwarded to non-tech users, and cannot be put into apandas
DataFrame without undergoing preliminary transformations..csv
format: in these cases, we have to convert each line to a dictionnary, which can occasion parsing errors.HOW
Create a
.csv
streamer.The text was updated successfully, but these errors were encountered: