Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write analysis data to file per collection #18

Open
robobenklein opened this issue May 13, 2021 · 2 comments
Open

Write analysis data to file per collection #18

robobenklein opened this issue May 13, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request performance

Comments

@robobenklein
Copy link
Member

Output can be imported separately into the db after run without a direct connection.

@robobenklein robobenklein self-assigned this May 14, 2021
@robobenklein robobenklein added performance enhancement New feature or request labels May 14, 2021
@robobenklein
Copy link
Member Author

example compatible file format:

"_key","name","city","state","country","lat","long","vip"
"00M","Thigpen ","Bay Springs","MS","USA",31.95376472,-89.23450472,false
"00R","Livingston Municipal","Livingston","TX","USA",30.68586111,-95.01792778,false
"00V","Meadow Lake","Colorado Springs","CO","USA",38.94574889,-104.5698933,false
"01G","Perry-Warsaw","Perry","NY","USA",42.74134667,-78.05208056,false
"01J","Hilliard Airpark","Hilliard","FL","USA",30.6880125,-81.90594389,false

@robobenklein
Copy link
Member Author

We will need to do dedup at some point during this process, either during writing or during import, or even a step in-between which could be something like external sorting.

An external sorting preprocess step before db import could allow dedup between multiple job results as well, maybe:

wsyntree-collector file-merge [paths to job output dirs] -o [path to combined output dir]
wsyntree-collector file-import [paths to jobor merged output dirs]

The flatfile storage format would likely be a single folder containing text files for each collection where each node should be one line (/entry, CSV or escaped style)

Could also get fancy and write intermediary storage using the treetops "hash composes path" style where division amongst files occurs by the _key/_id property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

1 participant